marbl / harvest

Other
52 stars 11 forks source link

majority of input genomes get ignored #32

Open karchern opened 6 years ago

karchern commented 6 years ago

Hi,

I ran parsnp in the fasta-reference mode only (so no genbank file). I put more than 500 query genomes into a folder, but only 36 of them were actually taken into account in the analysis.

Is that a known issue? What should I do?

Thank you!

Edit: I just reran the exact same command, and now 29 instead of 36 genomes were analyzed. I am at utterly at loss.

ghost commented 6 years ago

I'm getting the same issue. It's completely unpredictable on how many genomes actually are in the report. Did you figure anything out about this?

karchern commented 6 years ago

There is a flag in parsnp (I think it is '-c') which states that you curated your input genomes, meaning that no checks on them are employed. Try running parsnp with this flag and see if it works for you.

ghost commented 6 years ago

Thank you it works.

It seems like the problem was actually related to genomes which are highly unrelated resulting in a low Core Genome Percentage.