tyjo / coptr

Accurate and robust inference of microbial growth dynamics from metagenomic sequencing
GNU General Public License v3.0
16 stars 5 forks source link

Error in last step! #3

Closed BJCampbelllab closed 3 years ago

BJCampbelllab commented 3 years ago

Hi- I am running about 8 high quality MAGs through CoPTR with about 20 metagenomes. I've gotten everything to work until the middle of the last step. The error is below. The command did generate a coverage-maps-genome folder within my maps folder, and there are PKL files there that look to encompass all possible contigs within each MAGs. I tried lowering the minimum reads and coverage. Any help would be greatly appreciated! Thanks, Barb

[INFO] (Mar 04, 2021 08:53:25) CoPTR: done grouping by reference genome [INFO] (Mar 04, 2021 08:53:25) CoPTR: the --restart flag can be used to start from here [INFO] (Mar 04, 2021 08:53:25) CoPTRRef: checking reference genomes multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/bcampbell/anaconda3/envs/coptr/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "/home/bcampbell/anaconda3/envs/coptr/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/home/bcampbell/coptr/src/coptr_ref.py", line 813, in _parallel_helper return (ref_genome, self.estimate_ptrs(coverage_maps)) File "/home/bcampbell/coptr/src/coptr_ref.py", line 763, in estimate_ptrs read_positions, ref_genome_len, qc_result = rf.filter_reads(cm.read_positions, cm.length) File "/home/bcampbell/coptr/src/coptr_ref.py", line 108, in filter_reads filtered_read_positions, filtered_genome_length = self.filter_reads_phase1(read_positions, genome_length, bin_size) File "/home/bcampbell/coptr/src/coptr_ref.py", line 326, in filter_reads_phase1 binned_reads = self.bin_reads(read_positions, genome_length, bin_size) File "/home/bcampbell/coptr/src/coptr_ref.py", line 445, in bin_reads nbins = int(math.ceil(genome_length / bin_size)) ZeroDivisionError: float division by zero """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/bcampbell/coptr/coptr.py", line 449, in ProgramOptions() File "/home/bcampbell/coptr/coptr.py", line 65, in init getattr(self, args.command)() File "/home/bcampbell/coptr/coptr.py", line 287, in estimate threads=args.threads, plot_folder=args.plot File "/home/bcampbell/coptr/src/coptr_ref.py", line 935, in estimate_ptrs_coptr_ref flat_results = pool.map(coptr_ref._parallel_helper, flat_coverage_maps) File "/home/bcampbell/anaconda3/envs/coptr/lib/python3.7/multiprocessing/pool.py", line 268, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/home/bcampbell/anaconda3/envs/coptr/lib/python3.7/multiprocessing/pool.py", line 657, in get raise self._value ZeroDivisionError: float division by zero

tyjo commented 3 years ago

Hi Barb,

Thanks for reaching out! It looks like CoPTR is using the complete reference genome estimator instead of the contig estimator on your MAGs.

You can check if CoPTR is correctly identifying MAGs on the "extract" step with the --check-regex flag. On the example data:

python coptr.py extract example-data/bam example-data/coverage-maps --check-regex

This should output an id for each reference genome. If you see an id for each contig, something has gone wrong. Would you run the above, and let me know the result?

If CoPTR is outputting an id for each contig, then something has gone wrong in the indexing or mapping step. A few possibilities:

  1. Did you index your MAGs and perform read mapping with CoPTR's wrapper around bowtie2? CoPTR adds information to keep track of complete references vs assemblies.
  2. Is each MAG in its own fasta file?

Tyler

BJCampbelllab commented 3 years ago

Hi Tyler, Thanks for getting back to me so soon!

It looks like it gives a list of MAGs and their contigs. First two lines (it looks like they are in contigs instead of genomes/MAGs?).

CP_Spr15L08_56|contig-151_1051 CP_Spr15L08_56|contig-151_10528

This is what I ran for everything:

python /home/bcampbell/coptr/coptr.py index /home/bcampbell/Pelagibacterales/Pelagibacter_for_v7/original_fasta/CoPTR_analyses /home/bcampbell/Pelagibacterales/Pelagibacter_for_v7/original_fasta/CoPTR_analyses/CoPTRindex

python /home/bcampbell/coptr/coptr.py map --threads 40 /home/bcampbell/Pelagibacterales/Pelagibacter_for_v7/original_fasta/CoPTR_analyses/CoPTRindex /mnt/ocean/RawData/CB_DNA_trimmed/cutadapt_sickle /home/bcampbell/Pelagibacterales/Pelagibacter_for_v7/original_fasta/CoPTR_analyses/BAM

python /home/bcampbell/coptr/coptr.py extract --ref-genomes-regex /home/bcampbell/Pelagibacterales/Pelagibacter_for_v7/original_fasta/CoPTR_analyses/CoPTRindex.genomes /home/bcampbell/Pelagibacterales/Pelagibacter_for_v7/original_fasta/CoPTR_analyses/BAM /home/bcampbell/Pelagibacterales/Pelagibacter_for_v7/original_fasta/CoPTR_analyses/

So, yes, I used the CoPTR mapper.

Each MAG is in its own fasta file.

The CoPTRindex.genomes file contents (my 8 MAGs):

CP_Sum27L08_15 CP_Spr15G08_4 CP_Sum15G08_5 DE_Sum22DL08_52 CP_Spr15L08_1 CP_Sum15G08_40 DE_Sum29DL08_25 CP_Spr15L08_56

Any additional thoughts would be terrific! My dataset is cool – I have evidence of more ‘activity’ in some samples or MAGs than others, because I have metagenomes and metatranscriptomes and RPKG ratios of the two are suggesting different levels of activity depending on the MAG and sample. I think PTR estimates would be a great addition to the story!

Best, Barb

From: Tyler notifications@github.com Sent: Thursday, March 4, 2021 9:25 AM To: tyjo/coptr coptr@noreply.github.com Cc: Barbara Campbell bcampb7@clemson.edu; Author author@noreply.github.com Subject: Re: [tyjo/coptr] Error in last step! (#3)

Hi Barb,

Thanks for reaching out! It looks like CoPTR is using the complete reference genome estimator instead of the contig estimator on your MAGs.

You can check if CoPTR is correctly identifying MAGs on the "extract" step with the --check-regex flag. On the example data:

python coptr.py extract example-data/bam example-data/coverage-maps --check-regex

This should output an id for each reference genome. If you see an id for each contig, something has gone wrong. Would you run the above, and let me know the result?

If CoPTR is outputting an id for each contig, then something has gone wrong in the indexing or mapping step. A few possibilities:

  1. Did you index your MAGs and perform read mapping with CoPTR's wrapper around bowtie2? CoPTR adds information to keep track of complete references vs assemblies.
  2. Is each MAG in its own fasta file?

Tyler

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/tyjo/coptr/issues/3#issuecomment-790654222, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG6TSRPAHPLLJFHHYA32DM3TB6J2PANCNFSM4YTJ5EHQ.

tyjo commented 3 years ago

Thanks for the detailed report. It sounds like an interesting dataset.

The issue is this step:

python /home/bcampbell/coptr/coptr.py extract --ref-genomes-regex /home/bcampbell/Pelagibacterales/Pelagibacter_for_v7/original_fasta/CoPTR_analyses/CoPTRindex.genomes /home/bcampbell/Pelagibacterales/Pelagibacter_for_v7/original_fasta/CoPTR_analyses/BAM /home/bcampbell/Pelagibacterales/Pelagibacter_for_v7/original_fasta/CoPTR_analyses/

You should delete the old coverage map files, and run without the --ref-genomes-regex flag. It is used for bam files generated outside of CoPTR.

Let me know if you run into any other issues.

BJCampbelllab commented 3 years ago

Thanks! I’ll let you know if I have further issues.

From: Tyler notifications@github.com Sent: Thursday, March 4, 2021 10:08 AM To: tyjo/coptr coptr@noreply.github.com Cc: Barbara Campbell bcampb7@clemson.edu; Author author@noreply.github.com Subject: Re: [tyjo/coptr] Error in last step! (#3)

Thanks for the detailed report. It sounds like an interesting dataset.

The issue is this step:

python /home/bcampbell/coptr/coptr.py extract --ref-genomes-regex /home/bcampbell/Pelagibacterales/Pelagibacter_for_v7/original_fasta/CoPTR_analyses/CoPTRindex.genomes /home/bcampbell/Pelagibacterales/Pelagibacter_for_v7/original_fasta/CoPTR_analyses/BAM /home/bcampbell/Pelagibacterales/Pelagibacter_for_v7/original_fasta/CoPTR_analyses/

You should delete the old coverage map files, and run without the --ref-genomes-regex flag. It is used for bam files generated outside of CoPTR.

Let me know if you run into any other issues.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/tyjo/coptr/issues/3#issuecomment-790685526, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG6TSRPSRWKCOLPVR5DQYVDTB6O5VANCNFSM4YTJ5EHQ.

tyjo commented 3 years ago

I'm going to close this issue for now. Please let me know if you are still running into trouble and we can revisit.

BJCampbelllab commented 3 years ago

Thanks. Seems to be good!

From: Tyler @.> Sent: Friday, March 26, 2021 7:32 AM To: tyjo/coptr @.> Cc: Barbara Campbell @.>; Author @.> Subject: Re: [tyjo/coptr] Error in last step! (#3)

I'm going to close this issue for now. Please let me know if you are still running into trouble and we can revisit.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/tyjo/coptr/issues/3#issuecomment-808141135, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG6TSROQDKVVHE5QMWBWLKTTFRWBZANCNFSM4YTJ5EHQ.