mossmatters / HybPiper

Recovering genes from targeted sequence capture data
GNU General Public License v3.0
115 stars 45 forks source link

Issue with paralog_retriever #134

Closed erika-r-moore closed 12 months ago

erika-r-moore commented 1 year ago

Hello,

I am having issue with the paralog_retriever command as it is not catching paralogs that are known to occur in some of my samples.

For some background, I have run HybPiper2 before on some of these samples and retained paralogs for sequences with known duplications (paralogs were expected). I recently re-ran all the sequences to make sure that intronerate was run as I was having some issues with a different tool that uses HybPiper results as the input, and the paralog command is now failing with this error:


                                                     T
                                                        T
                                         C  G
 _    _            _       _____      T        G        A
| |  | |          | |     |  _  \  A              A  A
| |__| | __    __ | |___  | |_| |  _   _____   _____   _____
|  __  | \ \  / / |  _  \ |  ___/ | | |  _  \ |  _  | |  _  \
| |  | |  \ \/ /  | |_| | | |     | | | |_| | |  __/  | |  --
|_|  |_|   \  /   |_____/ |_|     |_| |  ___/ |_____| |_|
           / /                        | |
          /_/                         |_|

[INFO]:    Recovering paralog sequences...
[INFO]:    Creating directory: paralogs_all
[INFO]:    Creating directory: paralogs_no_chimeras
[INFO]:    Searching for paralogs for 111 samples, 1061 genes...
           Elapsed Time: 0:00:00|                               |ETA:  --:--:--
No chimeric stitched contig summary file found for sample Achillea_nobilis!
Traceback (most recent call last):
  File "/home/ermoore3/miniconda2/envs/hybpiper2/bin/hybpiper", line 10, in <module>
    sys.exit(main())
  File "/home/ermoore3/miniconda2/envs/hybpiper2/lib/python3.9/site-packages/hybpiper/assemble.py", line 1694, in main
    args.func(args)
  File "/home/ermoore3/miniconda2/envs/hybpiper2/lib/python3.9/site-packages/hybpiper/assemble.py", line 1528, in paralog_retriever_main
    paralog_retriever.main(args)
  File "/home/ermoore3/miniconda2/envs/hybpiper2/lib/python3.9/site-packages/hybpiper/paralog_retriever.py", line 351, in main
    num_seqs, has_paralogs, seqs_to_write_all, seqs_to_write_no_chimeras = retrieve_gene_paralogs_from_sample(
  File "/home/ermoore3/miniconda2/envs/hybpiper2/lib/python3.9/site-packages/hybpiper/paralog_retriever.py", line 69, in retrieve_gene_paralogs_from_sample
    chimeric_genes_to_skip = get_chimeric_genes_for_sample(sample_directory_name)
  File "/home/ermoore3/miniconda2/envs/hybpiper2/lib/python3.9/site-packages/hybpiper/retrieve_sequences.py", line 48, in get_chimeric_genes_for_sample
    with open(f'{sample_directory_name}/'
FileNotFoundError: [Errno 2] No such file or directory: 'Achillea_nobilis/Achillea_nobilis_genes_derived_from_putative_chimeric_stitched_contig.csv'
           Elapsed Time: 0:00:00|###############################|Time:  0:00:00

The assemble, retrieve_sequences, and stats commands appeared to have worked otherwise.

The command I used for assemble is:

while read name; 
do hybpiper assemble -t_dna cos_hyb_piper_targets.fasta -r ./$name*.fastq.tp.fastq --prefix $name --bwa --timeout_exonerate_contigs 300;
done < namelist.txt

I attached the stats file from the previous run and this one as an Excel sheet. I also attached the output and error files associated with this run.

Please let me know if you need anything else from me or if you have any suggestions to what might have happened!

Thanks so much!

hybpip-4945364.err.gz hybpip-4945364.out.txt HybPiper2_stats.xlsx

chrisjackson-pellicle commented 1 year ago

Hi @erika-r-moore,

The missing file that's producing the error you're seeing above (i.e. Achillea_nobilis/Achillea_nobilis_genes_derived_from_putative_chimeric_stitched_contig.csv) should have been written to the Achillea_nobilis sample directory when hybpiper assemble was run. In the hybpip-4945364.err file you attached, you can see that the hybpiper assemble command failed for Achillea_nobilis (and several other samples) with the error:

Traceback (most recent call last):
  File "/home/ermoore3/miniconda2/envs/hybpiper2/lib/python3.9/site-packages/hybpiper/assemble.py", line 1048, in exonerate_multiprocessing
    gene_name, prot_length, run_time = future.result()
  File "/home/ermoore3/miniconda2/envs/hybpiper2/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/home/ermoore3/miniconda2/envs/hybpiper2/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
pebble.common.ProcessExpired: Abnormal termination

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ermoore3/miniconda2/envs/hybpiper2/bin/hybpiper", line 10, in <module>
    sys.exit(main())
  File "/home/ermoore3/miniconda2/envs/hybpiper2/lib/python3.9/site-packages/hybpiper/assemble.py", line 1694, in main
    args.func(args)
  File "/home/ermoore3/miniconda2/envs/hybpiper2/lib/python3.9/site-packages/hybpiper/assemble.py", line 1427, in assemble
    exonerate_multiprocessing(genes,
  File "/home/ermoore3/miniconda2/envs/hybpiper2/lib/python3.9/site-packages/hybpiper/assemble.py", line 1073, in exonerate_multiprocessing
    print(f'error.traceback is: {error.traceback}')  # traceback of the function
AttributeError: 'ProcessExpired' object has no attribute 'traceback'

Before we troubleshoot this further, are you able to update to the latest version of HybPiper (version 2.1.6) and try the run again, please?

A couple of other things:

1) You mentioned you wanted to run intronerate - note that in version 2.1.6, intronerate is run by default.

2) In the Excel spreadsheet you attached, the sheet 1 label states '108 samples', whereas there are 117 samples. Might've just been a typo, but I wanted to make sure.

Cheers,

Chris

erika-r-moore commented 12 months ago

Hello Chris!

As suggested, updating my HybPiper to v.2.1.6 (not 2.0.1) solved this issue. Thank you for your help!

Best, Erika