wshuai294 / PStrain

Pstrain profiles strains in metagenomics data. It infers strain abundance and genotype for each species. Also, it has a single species mode; where given a BAM and VCF, it can phase the variants for any species.
MIT License
8 stars 2 forks source link

Use existing metaphlan3 run #10

Closed andrewjmc closed 2 years ago

andrewjmc commented 2 years ago

Hello,

Is it possible to accelerate this useful tool when metaphlan3 has already been run on samples?

Thanks,

Andrew

wshuai294 commented 2 years ago

Dear Andrew,

I think you arised a very important advice. So I add a function to do this.

Specifically, you can build a file to record the metaphlan3 result file of each sample in each line (In particular, "--tax_lev s" should be added while running metaphlan3.). Then, please afford this file to PStrain_V30.py with the parameter "--metaphlan3_output_files". You can look at "readme" for details.

Please let me know if it helps. Best, Shuai

andrewjmc commented 2 years ago

This is great, thanks for the quick response and commit! Next question: given that metaphlan runs already do a bowtie alignment against the marker gene reference database, is it possible to use these in the pipeline?

andrewjmc commented 2 years ago

I'm currently trying the pipeline out with the pre-existing metaphlan files. Seems to have worked fine so far!

wshuai294 commented 2 years ago

Hello,

Happy to hear this!

As for the bowtie-alignment advice, aligning the reads to the sample-specific marker genes is adequately fast; hence, I don't intend to optimize this right now. Anyway, thanks a lot for your kind advice.

Best, shuai

andrewjmc commented 2 years ago

Hi Shuai,

This makes sense. The run worked perfectly, and it was very convenient to re-use metaphlan runs (I had not run with just species level, so I just grepped out the s__ lines and removed all of the higher order taxonomic terms in the lineages - seemed to be fine). I am now enjoying stretching my mind around the results!

Thanks,

Andrew