Closed meerakulous closed 2 years ago
Hello,
Could you share the output of spumoni when you ran this command?: spumoni run -r <ref_file> -p <read_file> -P -f
The output files should be in the same directory as the read file.
This README describes the input for the analyze_pml.py
script. It essentially requires two different *.pseudo_lengths
files to compare.
Here are all the files generated when I run spumoni with my reference file pangenomehuman.fasta
Okay, so that looks like the directory where all the index files are stored. Most of them are temporary files. In the directory, that you have your reads file, you should see a file that ends with *.pseudo_lengths
?
Typically, I put the reference file in one folder, and the reads file in another. It seems like that might be what you did as well, since I don't see any files in that screenshot that appear to be the reads.
Sorry for the late reply -- I can now calculate the pseudo length file. Now I'm having an issue with calculating the matching statistics. Each of my pseudo length files are 62 GB and I'm not able to run analyze_pml.py on my server with 225 GB of RAM.
I see, well is there a particular error that you see that makes you think that you cannot run analyze_pml.py
?
One way around this for now, is that you can take a small portion of *.pseudo_lengths
file by running something like head -n 2000 *.pseudo_lengths
which would extract the results for the first 1000 reads since each read's results consists of two lines. Then you can run analyze_pml.py
on that smaller file. It doesn't have to be 1000 reads, it is just an example.
In the coming weeks, I plan on integrating in the analyze_pml.py
into the main SPUMONI code, as well as some additional code which will make those *.pseudo_lengths
files smaller.
Taking a small portion of the pseudo_lengths works! I think just the size of my data is causing my job where I run analyze_pml to get killed.
That is great, I'll close the issue then. Like I mentioned above, I hope to make some commits in the coming weeks to make the process a little more streamlined and less memory intensive for large datasets.
I ran spumoni with this command "./spumoni run -r -p -P -f "
None of the output files ended with "pseudo_lengths." Which files should I feed into analyze_pml.py?