mossmatters / HybPiper

Recovering genes from targeted sequence capture data
GNU General Public License v3.0
108 stars 45 forks source link

'hybpiper assemble' didn't capture any target genes #144

Open xueqinerer opened 5 months ago

xueqinerer commented 5 months ago

Hi, I am using hybpiper v.2.1.6 and when trying to run 'hybpiper assemble', I didn't capture any target genes. But I don't know what happened. So I wonder if there is a problem with the format of test_R*.fq. Please let me know if you need any more info, and thanks in advance for your time!!

My running script is as follows: hybpiper assemble -t_dna test_finall.fa -r test_R*.fq --prefix test --bwa

This is test_finall.fa format:

Acaena_tenera-4471 TTTATTATGCCACTGGACTTGGGTGCAAAAGGAAGCTGCCAGATTGGTGGAAACGTTTCA ACCAATGCAGGTGGTTTGCGCCTTGTGCGCTATGGATCACTTCATGGAACCGTACTTGGT CTTGAAGTTGTTTTGGCTAATGGTGATGTTCTTGACATGCTTGGGACTTTACGGAAAGAT AACACTGGGTATGACCTAAAGCATTTGTTCATAGGAAGTGAAGGATCTTTGGGAATTGTG ACCAAGGTTTCCATACTTACCCCTCCAAAGTGGTTTTCAGTGAATGTAGCTTTCCTTGCA TGTCAAGACTACTTTAGCTGCCAGAAACTTCTAGTGGAAGCAAAGAACAAACTTGGGGAG ATTCTATCTGCATTTGAATTCTTGGATAGCCATGCTATGGATTTGGTCGTTTTGAATCAT TTGGATGGTGTTCGCAATCCATTACCTCCCACAATTCTCAACTTTTATGTTTTGATTGAG ACAACAGGCAGTGATGAAACTTCTGACAGGTACCACCTAGACAGAGAGAAGCTTGAAGCC TTCCTAGTTCATGCCATGGAAGGTGGTTTGATTTCGGATGGAGCTATAGCTCAAGACATA AACCAAGCATCAGCGTTCTGGTATATAAGAGAG Acaena_tenera-4527 GAAGAGAGGATGAGTGTTTTGGTGATTGGTGGAGGGGGAAGGGAACATGCTCTTTGCTAT GCATTGAAGCGGTCCCCGAGTTGTGATGCGGTTTTTTGTGCTCCGGGAAATGTTGGAATT TCCAACGCTGGGGTTGCCACTTGCATTTCGGACCTCGACATCTTTGATAGCTCGGCTGTG ATTTCCTTCTGCCACAAATGGGGTGTGGGGCTTGTTGTTGTTGGACCCGAGGCCCCTCTT GTTGCGGGTCTGGTGAATGATCTACTTAAGGCTGGAATCCATGCTTTTGGCCCATCATCT GAGGCCGCTGCTTTGGAAGGATCCAAGAACTTTATGAAAACTTTGTGTGACAAGTATGGA

This is test_R1.fq format: @SRR18716335.9 9 length=300 GAGATCCTTGGTATGATCCAAGTGCAGCAGTAGCTCTTTCCCGCACAGCAGACGTCACCGTTTCACAGGATGGG +SRR18716335.9 9 length=300 8ACCGGGGGGGGFGGGGGGGGDFGGFFGFGGGGG9FGGGF<EGEDEGCAEFFG7FGCEFFE@<FCGGGGFGGGG @SRR18716335.10 10 length=299 GAACAATGGACTATTCACTATGCTATAATGGATTTCCAAACGTGGTAGATGGATACAATGATGCAAATTGAATT +SRR18716335.10 10 length=299 8BCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGDFFGGGGGGGFGGGGGGFFDFGGGF @SRR18716335.11 11 length=300 TCTCATGTCAGAATAGATTTTCATATATTTATTATAAAATTCAAATCTAAGATTTCCGATAAGATTTTTAGATAANAT

chrisjackson-pellicle commented 5 months ago

Hi @xueqinerer ,

Your reads look fine. What does the test_hybpiper_assemble_<date_time>.log file in your test sample directory say? Feel free to upload it here and I can have a look too.

Cheers,

Chris

xueqinerer commented 5 months ago

Hi @chrisjackson-pellicle , test_hybpiper_assemble_2024-03-28-20_36_37.log This is my 'test_hybpiper_assemble_2024-03-28-20_36_37.log'. Thanks very much.

chrisjackson-pellicle commented 5 months ago

So, I can see that your reads are successfully mapped to your target file sequences, and line 11,058 shows:

2024-03-28 20:37:41,806 - assemble.py - hybpiper.assemble - distribute_bwa - INFO - [INFO]:    In total, 45732 reads from the paired-end read files will be distributed to gene directories

However, line 11,068 shows that the initial attempt at assembly of these reads using SPAdes failed for all 353 genes:

2024-03-28 20:38:39,119 - spades_runner.py - hybpiper.assemble.hybpiper.spades_runner - spades_initial - INFO - [WARNING]: Total number of genes with failed initial SPAdes run: 343. Gene names can be found in the sample log file.

...and that the re-run of SPAdes assemblies (which tries again with shorter kmers) failed for 342/353 genes (line 11,079):

2024-03-28 20:38:47,960 - spades_runner.py - hybpiper.assemble.hybpiper.spades_runner - rerun_spades - INFO - [WARNING]: Total number of genes with failed SPAdes re-runs: 342. Gene names can be found in the sample log file.

Then, in lines 11,445 - 11,507, you can see that Exonerate failed for all 11 genes that actually had SPAdes contigs.

It looks like something is going wrong during the SPAdes assembly step. Can you zip and upload the spades.log file in your test sample directory? It might be too large - in that case, can you upload a portion of it?

Cheers,

Chris