hoelzer / pocp

Calculation of the Percentage of Conserved Proteins following Qin, Xie et al. 2014 but using DIAMOND instead of BLASTP for alignments.
GNU General Public License v3.0
22 stars 5 forks source link

Calculate POCP for 1-vs-all | Nextflow support #13

Closed garrison-chen closed 10 months ago

garrison-chen commented 1 year ago

Thanks for the great tool! I've been using it for a while. I want to ask if the nextflow implementation also supports 1-vs-all calculation as in release 1.1.1 (previous ruby implementation before nextflow)? If yes, how should we call the program? Many thanks!

Best, Chen

hoelzer commented 11 months ago

Hey @garrison-chen sorry for the late reply! It's hard for me to find time looking into this POCP pipeline but let me see what I can do regarding 1-vs-all comparison instead of all-vs-all

hoelzer commented 11 months ago

Hey,

I added the one-vs-all mode now. It is automatically activated when you specify a specific genome (--genome) or protein FASTA (--protein) in addition to the default input --genomes or --proteins. If you do that, the comparisons will only be made between the additional genome/protein FASTA vs all others.

For example:

nextflow pull hoelzer/pocp
# Currently the changes are in the branch "one-vs-all" for testing
nextflow run hoelzer/pocp -r one-vs-all --genomes 'example/*.fasta' --genome example/Cav_10DC88.fasta -profile local,docker

will give you

❯ cat results/pop-matrix.tsv
ID  Cav_10DC88  Cav_11DC096 Cga_08-1274-3   Cga_12-4358 Ctr_A-HAR-13
Cav_10DC88  100.0   98.9172 96.5928 96.4865 83.171
Cav_11DC096 98.9172 100.0   0.0 0.0 0.0
Cga_08-1274-3   96.5928 0.0 100.0   0.0 0.0
Cga_12-4358 96.4865 0.0 0.0 100.0   0.0
Ctr_A-HAR-13    83.171  0.0 0.0 0.0 100.0

Or if I switch the "target genome"

nextflow run hoelzer/pocp -r one-vs-all --genomes 'example/*.fasta' --genome example/Cga_08-1274-3.fasta -profile local,docker -resume

I will get:

❯ cat results/pop-matrix.tsv
ID  Cav_10DC88  Cga_08-1274-3   Cav_11DC096 Cga_12-4358 Ctr_A-HAR-13
Cav_10DC88  100.0   96.5928 0.0 0.0 0.0
Cga_08-1274-3   96.5928 100.0   97.1207 99.8894 83.9513
Cav_11DC096 0.0 97.1207 100.0   0.0 0.0
Cga_12-4358 0.0 99.8894 0.0 100.0   0.0
Ctr_A-HAR-13    0.0 83.9513 0.0 0.0 100.0

It also works if you directly give protein FASTAs as input, skipping the annotation step:

nextflow run hoelzer/pocp -r one-vs-all --proteins 'example/*.faa' --protein example/Cga_08-1274-3.faa -profile local,docker -resume
❯ cat results/pocp-matrix.tsv
ID  Cav_10DC88  Cga_08-1274-3   Cmu_Nigg    Cps_6BC Ctr_D-UW-3-CX
Cav_10DC88  100.0   96.7532 0.0 0.0 0.0
Cga_08-1274-3   96.7532 100.0   83.5196 90.0476 84.1402
Cmu_Nigg    0.0 83.5196 100.0   0.0 0.0
Cps_6BC 0.0 90.0476 0.0 100.0   0.0
Ctr_D-UW-3-CX   0.0 84.1402 0.0 0.0 100.0

Finally, here is a mixed command using a set of genomes as input and comparing them one-vs-all against a given protein multi-FASTA:

nextflow run hoelzer/pocp -r one-vs-all --genomes 'example/*.fasta' --protein example/Cga_08-1274-3.faa -profile local,docker -resume
❯ cat results/pocp-matrix.tsv
ID  Cav_10DC88  Cga_08-1274-3   Cav_11DC096 Cga_12-4358 Ctr_A-HAR-13
Cav_10DC88  100.0   96.5405 0.0 0.0 0.0
Cga_08-1274-3   96.5405 100.0   97.067  99.8895 83.9049
Cav_11DC096 0.0 97.067  100.0   0.0 0.0
Cga_12-4358 0.0 99.8895 0.0 100.0   0.0
Ctr_A-HAR-13    0.0 83.9049 0.0 0.0 100.0

Can you test it please, @garrison-chen ?

If everything works, I will merge that into the main branch and do another release.

Cheers, Martin

garrison-chen commented 10 months ago

Hi Martin,

Thanks a lot for the follow-up! I have tested it and so far everything works from my side. It's an amazing tool!

Best, Chen

hoelzer commented 10 months ago

Great, happy to hear that! Thanks!