lucapinello / CRISPResso

Software pipeline for the analysis of CRISPR-Cas9 genome editing outcomes from sequencing data
Other
131 stars 55 forks source link

free variable 'df_genes' referenced before assignment in enclosing scope", u'occurred at index Site1' #39

Closed tiramisutes closed 6 years ago

tiramisutes commented 6 years ago

Hi, I get follows error when running the CRISPResso with Mixed mode (Amplicons + Genome).

ERROR: ("free variable 'df_genes' referenced before assignment in enclosing scope", u'occurred at index Site1') 

My used genome is no exist in the UCSC. So, I create the gene annotations file through converting a GFF3 annotations file to a genePred file then input the --gene_annotations parameter. What’s wrong with it? And how to solve this problem? Thanks.

lucapinello commented 6 years ago

Hi

It may be related to the format you are using currently.

This is an example for a valid file:

bin name chrom strand txStart txEnd cdsStart cdsEnd exonCount exonStarts exonEnds score name2 cdsStartStat cdsEndStat exonFrames 183 ENST00000358204.4 chr7 + 115850546 115898837 115850761 115897536 7 115850546,115874587,115889073,115890214,115891813,115892371,115897347, 115850788,115874673,115889326,115890550,115892029,115892530,115898837, 0 TES cmpl cmpl 0,0,2,0,0,0,0, 183 ENST00000537767.1 chr7 + 115850602 115897722 115891837 115897536 5 115850602,115874587,115891813,115892371,115897347, 115850788,115874673,115892029,115892530,115897722, 0 TES cmpl cmpl -1,-1,0,0,0, 183 ENST00000393481.2 chr7 + 115862857 115898837 115874587 115897536 7 115862857,115874587,115889073,115890214,115891813,115892371,115897347, 115863080,115874673,115889326,115890550,115892029,115892530,115898837, 0 TES cmpl cmpl -1,0,2,0,0,0,0, 183 ENST00000485009.1 chr7 + 115863004 115890086 115863004 115863004 3 115863004,115874587,115889073, 115863080,115874673,115890086, 0 TES none none -1,-1,-1, 1469 ENST00000456289.1 chr7 - 115877982 115879305 115877982 115877982 2 115877982,115879068, 115878330,115879305, 0 AC073130.3 none none -1,-1, 1469 ENST00000444244.1 chr7 - 115878057 115967950 115878057 115878057 5 115878057,115879068,115883895,115895171,115967885, 115878386,115879303,115883990,115895289,115967950, 0 AC073130.3 none none -1,-1,-1,-1,-1,

The file should be gzip compressed.

The instructions to get an annotation file in this format are in the manual. I am reporting them here for your convenience:

The user can download this file from the UCSC Genome Browser ( http://genome.ucsc.edu/cgi-bin/hgTables?command=start ) selecting as table "knowGene", as output format "all fields from selected table" and as file returned "gzip compressed"

Hope this is helpful.

Luca

tiramisutes commented 6 years ago

I need to create the UCSC annotation file by myself because of no existing annotation in the UCSC Genome Browser. There is some difference between the example file you show above and the UCSC Genome Browser explanation (Gene Predictions (Extended)).

And what is the first column "bin"? As your suggestion, I prepare the annotation file to have follows head and then gzip compressed to input the --gene_annotations parameter.

name    chrom   strand  txStart txEnd   cdsStart        cdsEnd  exonCount       exonStarts      exonEnds        score   name2   cdsStartStat    cdsEndStat      exonFrames

But I still get the error stdout

ERROR: ("'Series' object has no attribute 'name2'", u'occurred at index Site1') 

And how to solve this problem? Thanks.

lucapinello commented 6 years ago

Unfortunately, they are not consistent with their annotations, I model the tool with the gencode v19 for human genes that I downloaded from their website. In the file you are currently providing to CRISPResso probably you are missing the name2 column (it should contain the gene symbol).

I think you can easily fix this if you use exactly the format I have provided as an example.

I don't use the column bin so you can just fill with 0 if you want.

Best,

Luca

tiramisutes commented 6 years ago

Thanks. It's working.

lucapinello commented 6 years ago

Great, do you mind to share your annotation file here? It may be helpful for other users not using hg19.

Thanks!

tiramisutes commented 6 years ago

Follows is my way to create a UCSC annotation file than no existing in the UCSC Genome Browser. First, you need the genome annotation file with gtf format.

gtfToGenePred -genePredExt genome.gtf genome.gpred
cat genome.gpred | sed "s/^/0\t/g" | awk -F"\t" 'BEGIN{OFS="\t";print "bin\tname\tchrom\tstrand\ttxStart\ttxEnd\tcdsStart\tcdsEnd\texonCount\texonStarts\texonEnds\tscore\tname2\tcdsStartStat\tcdsEndStat\texonFrames"} {print}' > UCSC_genome.gpred
gzip UCSC_genome.gpred

Finally, I get the UCSC_genome.gpred.gz like follows:

bin     name    chrom   strand  txStart txEnd   cdsStart        cdsEnd  exonCount       exonStarts      exonEnds        score   name2   cdsStartStat    cdsEndStat      exonFrames
0       evm.model.Gh_A01G0001   A01     +       15704   19194   15704   19194   7       15704,16263,16883,17483,18384,18693,19061,      15772,16319,17103,17623,18503,18827,19194, 0       evm.TU.Gh_A01G0001      incmpl  incmpl  0,2,1,2,1,0,2,
lucapinello commented 6 years ago

Thanks! This is really helpful!