merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
439 stars 145 forks source link

Error when importing contigs with gene calls from external file #371

Closed jtremblay closed 8 years ago

jtremblay commented 8 years ago

Dear Anvio developers, I ran the following command :

anvi-gen-contigs-database --debug --external-gene-calls ./gene_prediction/Contigs_foranvio.tsv -f ./assembly/Contigs_corrheaders.fasta -o anvio/Contigs.db

and got the following stack trace:

Traceback (most recent call last):
  File "/gs/scratch/jtrembla/anvio/anvio/bin/anvi-gen-contigs-database", line 43, in <module>
    a.create(args)
  File "/gs/scratch/jtrembla/anvio/anvio/anvio/dbops.py", line 1161, in create
    gene_calls_tables.populate_genes_in_splits_tables()
  File "/gs/scratch/jtrembla/anvio/anvio/anvio/dbops.py", line 1856, in populate_genes_in_splits_tables
    genes_in_splits.add(split_name, start, stop, self.gene_calls_dict_id_to_db_unique_id[gene_callers_id], self.gene_calls_dict[gene_callers_id]['start'], self.gene_calls_dict[gene_callers_id]['stop'])
KeyError: 0

After some debugging I found that genes_in_splits.add(...) at line 1856 in dbops.py complained that the gene_callers_id could not find any matches for gene "id" 0. I called my genes with Metagenemark and the gene numbering starts at 1 and not 0 which may explain the encountered error.

gene_callers_id   contig   start stop  direction   partial  source   version
1  contig-0 2  320   f  1  metagenemark   v1.0  
2  contig-0 742   1453  f  1  metagenemark   v1.0  
3  contig-0 1456  1981  f  1  metagenemark   v1.0  
4  contig-0 2224  2425  f  1  metagenemark   v1.0

I did a temporary fix at line 1852:

              for gene_callers_id in gene_calls_in_contigs_dict[contig]:
                    if gene_callers_id == 0:
                        continue

Which does the job for now. It would probably be good to implement a more elegant fix in the future. Cheers,

meren commented 8 years ago

Hi Julien,

This is scary! :)

This looks like the v2 branch (thank you for trying it), may I ask you which commit are you at? (the output of this would be very helpful : git log | head -n 10). Because I have something else at line 1856 in my dbops.py :)

Best wishes,

A. Murat Eren (meren) http://merenlab.org :: gpg https://keybase.io/meren

On Tue, Jun 21, 2016 at 8:47 AM, Julien Tremblay notifications@github.com wrote:

Dear Anvio developers, I ran the following command :

anvi-gen-contigs-database --debug --external-gene-calls ./gene_prediction/Contigs_foranvio.tsv -f ./assembly/Contigs_corrheaders.fasta -o anvio/Contigs.db

and got the following stack trace:

Traceback (most recent call last): File "/gs/scratch/jtrembla/anvio/anvio/bin/anvi-gen-contigs-database", line 43, in a.create(args) File "/gs/scratch/jtrembla/anvio/anvio/anvio/dbops.py", line 1161, in create gene_calls_tables.populate_genes_in_splits_tables() File "/gs/scratch/jtrembla/anvio/anvio/anvio/dbops.py", line 1856, in populate_genes_in_splits_tables genes_in_splits.add(split_name, start, stop, self.gene_calls_dict_id_to_db_unique_id[gene_callers_id], self.gene_calls_dict[gene_callers_id]['start'], self.gene_calls_dict[gene_callers_id]['stop']) KeyError: 0

After some debugging I found that genes_in_splits.add(...) at line 1856 in dbops.py complained that the gene_callers_id could not find any matches for gene "id" 0. I called my genes with Metagenemark and the gene numbering starts at 1 and not 0 which may explain the encountered error.

gene_callers_id contig start stop direction partial source version 1 contig-0 2 320 f 1 metagenemark v1.0 2 contig-0 742 1453 f 1 metagenemark v1.0 3 contig-0 1456 1981 f 1 metagenemark v1.0 4 contig-0 2224 2425 f 1 metagenemark v1.0

I did a temporary fix at line 1852:

          for gene_callers_id in gene_calls_in_contigs_dict[contig]:
                if gene_callers_id == 0:
                    continue

Which does the job for now. It would probably be good to implement a more elegant fix in the future. Cheers,

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/meren/anvio/issues/371, or mute the thread https://github.com/notifications/unsubscribe/AAMCu6n5eZsuZkTR29FYomNNCf_OR8Nsks5qN-thgaJpZM4I6wKN .

jtremblay commented 8 years ago

This is the output of git log | head -n 10:


commit 048d73967d5fca09380d1126b5880ee2ef76955f
Author: A. Murat Eren <a.murat.eren@gmail.com>
Date:   Thu Jun 16 16:46:37 2016 -0500

    bad source name is fixed.

commit f5a8afe3d1194820d424bf5d52324dd93d2e2167
Author: A. Murat Eren <a.murat.eren@gmail.com>
Date:   Thu Jun 16 15:38:21 2016 -0500

Yeah I should probably have used a stable release :) I installed with git clone --recursive https://github.com/meren/anvio.git . Perhaps I should re-clone?

jtremblay commented 8 years ago

Eren, Okay but however, I added one stderr print line to try to debug, which yes actually have changed the line order by one line...

meren commented 8 years ago

​Stable release is too far behind. I will release the v2 branch very soon, so having someone like you testing it is perfect! :)

Can you do git checkout anvio/dbops.py and then git pull to have the latest codebase?

Then if the error persists, I will take a more careful look, because clearly this should never happen :)

Thank you again!

Best wishes,

A. Murat Eren (meren) http://merenlab.org :: gpg https://keybase.io/meren

On Tue, Jun 21, 2016 at 10:00 AM, Julien Tremblay notifications@github.com wrote:

This is the output of git log | head -n 10:

commit 048d73967d5fca09380d1126b5880ee2ef76955f Author: A. Murat Eren a.murat.eren@gmail.com Date: Thu Jun 16 16:46:37 2016 -0500

bad source name is fixed.

commit f5a8afe3d1194820d424bf5d52324dd93d2e2167 Author: A. Murat Eren a.murat.eren@gmail.com Date: Thu Jun 16 15:38:21 2016 -0500

Yeah I should probably have used a stable release :) I installed with git clone --recursive https://github.com/meren/anvio.git . Perhaps I should re-clone?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/meren/anvio/issues/371#issuecomment-227467901, or mute the thread https://github.com/notifications/unsubscribe/AAMCu7fX_v0nccHb_jyWCw6mznuEfVBdks5qN_ymgaJpZM4I6wKN .

jtremblay commented 8 years ago

Hi Eren, Okay done, but unfortunately it gives me that error again :|

Traceback (most recent call last):
  File "/gs/scratch/jtrembla/anvio/anvio/bin/anvi-gen-contigs-database", line 43, in <module>
    a.create(args)
  File "/gs/scratch/jtrembla/anvio/anvio/anvio/dbops.py", line 1177, in create
    gene_calls_tables.populate_genes_in_splits_tables()
  File "/gs/scratch/jtrembla/anvio/anvio/anvio/dbops.py", line 1871, in populate_genes_in_splits_tables
    genes_in_splits.add(split_name, start, stop, self.gene_calls_dict_id_to_db_unique_id[gene_callers_id], self.gene_calls_dict[gene_callers_id]['start'], self.gene_calls_dict[gene_callers_id]['stop'])
KeyError: 0

Cheers,

meren commented 8 years ago

I am looking into this, Julien. I apologize for the inconvenience. Meanwhile, would you mind sending me a FASTA file that contains only contig-0 from Contigs_corrheaders.fasta and your Contigs_foranvio.tsv file?

So I can make double sure that things will work?

A. Murat Eren (meren) http://merenlab.org :: gpg https://keybase.io/meren

On Tue, Jun 21, 2016 at 1:43 PM, Julien Tremblay notifications@github.com wrote:

Hi Eren, Okay done, but unfortunately it gives me that error again :|

Traceback (most recent call last): File "/gs/scratch/jtrembla/anvio/anvio/bin/anvi-gen-contigs-database", line 43, in a.create(args) File "/gs/scratch/jtrembla/anvio/anvio/anvio/dbops.py", line 1177, in create gene_calls_tables.populate_genes_in_splits_tables() File "/gs/scratch/jtrembla/anvio/anvio/anvio/dbops.py", line 1871, in populate_genes_in_splits_tables genes_in_splits.add(split_name, start, stop, self.gene_calls_dict_id_to_db_unique_id[gene_callers_id], self.gene_calls_dict[gene_callers_id]['start'], self.gene_calls_dict[gene_callers_id]['stop']) KeyError: 0

Cheers,

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/meren/anvio/issues/371#issuecomment-227533636, or mute the thread https://github.com/notifications/unsubscribe/AAMCuwiZtO0NMaecT8U-JuTLrZlHzo1Xks5qODDUgaJpZM4I6wKN .

meren commented 8 years ago

Nevermind!

I managed to reproduce the error!

meren commented 8 years ago

Hi Julien,

1227f5e117b55be0eb6609362ef63b43575ee30b fixes this. Can you please update your repo and try again? :)

There may be other issues downstream, but the one we have been discussing about must be fixed now.

Thank you very much for your patience and help!

jtremblay commented 8 years ago

Yes it works! Many thanks Eren, -Julien