maximilianh / crisporWebsite

All source code of the crispor.org website
http://crispor.org
Other
71 stars 42 forks source link

output of guide cfd score #31

Closed genya closed 5 years ago

genya commented 5 years ago

the current version of crispor seems to calculates the guide cfd score but doesn't print it to the output. Instead only the MIT specificity score and number of off-targets is printed. Would you be able to indicate the portion of the code that should be modified to include the CFD score among the outputs? Thank you!

genya commented 5 years ago

the issue was just that CFD score was missing from the header, line 376 of crispor.py guideHeaders = ["guideId", "targetSeq", "mitSpecScore", "offtargetCount", "targetGenomeGeneLocus"]

should be guideHeaders = ["guideId", "targetSeq", "mitSpecScore", "CFDscore", "offtargetCount", "targetGenomeGeneLocus"]

maximilianh commented 5 years ago

Sorry I’m traveling and didn’t reply right away. Great that you found and sorry for the late reply, if I don’t reply right away and you’re stuck just ping me again in the future.

On Sun 16 Jun 2019 at 18:22, Evgeni Frenkel notifications@github.com wrote:

Closed #31 https://github.com/maximilianh/crisporWebsite/issues/31.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TJ2H376L2PCVFHVKRTP2ZZGJA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOSAATHWI#event-2415997913, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TJ3NISWKHZILEVPTMLP2ZZGJANCNFSM4HXW47MQ .

genya commented 5 years ago

No worries, thank you Max! I'm repeating a genome-wide library design using CRISPOR because of mistakes in my previous iteration, so I re-installed the current version.

I noticed a weird discrepancy for the output of one guide GTACATCGGCTGAGTGACGGTGG (files attached). It seems that the guide itself is recognized as an offtarget and that this reduces the MIT specificity score, although the CFD score is unaffected. Running the website CRISPOR on the same input did not produce this error.

Here's the input:

HGNC:84 exon names ENST00000614428.4_15 range=chr17:37247991-37248176 strand=- ctcattctcattccctacagGTGACTCGACAGTCCCCCAACTCCTATGTGGTGATCATGAATGGCTCATGTGTAGAAGTAGATGTACATCGGCTGAGTGACGGTGGACTGCTCTTGTCCTATGATGGCAGCAGTTATACTACGTATATGAAAGAGGAAGTGGATAGgtaagtggctgtttgaggtc

I think the issue is that this sequence also appears on one of those free-floating alt contigs of the human genome (I call them that because have litttle idea what they mean). The website CRISPOR recognizes the sequence as part of: Homo sapiens (hg38), chr17_KI270857v1_alt:1483980-1484165, reverse genomic strand

But its location in Ensembl and Genome Browser is: Chromosome 17: 37,084,994-37,359,116 reverse strand.

So perhaps crispor.py partially recognizes the sequence as being on both chr17 and chr17_KI27...v1_alt

GTACATCGGCTGAGTGACGGTGG.guides.txt GTACATCGGCTGAGTGACGGTGG.offtar.txt

genya commented 5 years ago

PS: the same occurs for other portions of the ACACA gene, for example:

HGNC:84 exon names ENST00000614428.4_7 range=chr17:37270731-37270881 strand=- ttttttctttctttttgaagGCAGCTGAGGAAGTTGGATATCCAGTAATGATCAAGGCCTCAGAGGGAGGAGGAGGGAAGGGAATTAGAAAAGTCAACAATGCAGATGACTTCCCTAATCTCTTCAGACAGgtagagtataagctgttttt

maximilianh commented 5 years ago

Yes there is code that recognizes _alt sequences... I wonder if I just fixed this very recently and haven’t pushed the change yet...?

One sec...

On Sun 16 Jun 2019 at 23:08, Evgeni Frenkel notifications@github.com wrote:

PS: the same occurs for other portions of the ACACA gene, for example:

HGNC:84 exon names ENST00000614428.4_7 range=chr17:37270731-37270881 strand=-

ttttttctttctttttgaagGCAGCTGAGGAAGTTGGATATCCAGTAATGATCAAGGCCTCAGAGGGAGGAGGAGGGAAGGGAATTAGAAAAGTCAACAATGCAGATGACTTCCCTAATCTCTTCAGACAGgtagagtataagctgttttt

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TPGAMO3QF6ADITI5C3P222VZA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZWGFQ#issuecomment-502489878, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TK5DHZYCQV6LIWTKJ3P222VZANCNFSM4HXW47MQ .

maximilianh commented 5 years ago

OK, I pushed the current code (which should have been pushed before, are you sure you did a git pull recently?)

You can look at the code, isAltChrom() is the relevant function, it's called in two places.

On Mon, Jun 17, 2019 at 12:08 AM Evgeni Frenkel notifications@github.com wrote:

PS: the same occurs for other portions of the ACACA gene, for example:

HGNC:84 exon names ENST00000614428.4_7 range=chr17:37270731-37270881 strand=-

ttttttctttctttttgaagGCAGCTGAGGAAGTTGGATATCCAGTAATGATCAAGGCCTCAGAGGGAGGAGGAGGGAAGGGAATTAGAAAAGTCAACAATGCAGATGACTTCCCTAATCTCTTCAGACAGgtagagtataagctgttttt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TPGAMO3QF6ADITI5C3P222VZA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZWGFQ#issuecomment-502489878, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TK5DHZYCQV6LIWTKJ3P222VZANCNFSM4HXW47MQ .

genya commented 5 years ago

Well, I recloned the git repository a few days ago, but in any case, running the sequence below on the CRISPOR website (just now) identifies it as being on chr17_KI270857v1_alt:1506687-1506837.

HGNC:84 exon names ENST00000614428.4_7 range=chr17:37270731-37270881 strand=- ttttttctttctttttgaagGCAGCTGAGGAAGTTGGATATCCAGTAATGATCAAGGCCTCAGAGGGAGGAGGAGGGAAGGGAATTAGAAAAGTCAACAATGCAGATGACTTCCCTAATCTCTTCAGACAGgtagagtataagctgttttt

By blat, the sequence has two 100% full-length matches:

chr17_KI270857v1_alt - 1506687 1506837

chr17 - 37270731 37270881

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Sun, Jun 16, 2019 at 6:50 PM Maximilian Haeussler < notifications@github.com> wrote:

OK, I pushed the current code (which should have been pushed before, are you sure you did a git pull recently?)

You can look at the code, isAltChrom() is the relevant function, it's called in two places.

On Mon, Jun 17, 2019 at 12:08 AM Evgeni Frenkel notifications@github.com wrote:

PS: the same occurs for other portions of the ACACA gene, for example:

HGNC:84 exon names ENST00000614428.4_7 range=chr17:37270731-37270881 strand=-

ttttttctttctttttgaagGCAGCTGAGGAAGTTGGATATCCAGTAATGATCAAGGCCTCAGAGGGAGGAGGAGGGAAGGGAATTAGAAAAGTCAACAATGCAGATGACTTCCCTAATCTCTTCAGACAGgtagagtataagctgttttt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TPGAMO3QF6ADITI5C3P222VZA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZWGFQ#issuecomment-502489878 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AACL4TK5DHZYCQV6LIWTKJ3P222VZANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZWLSF7JJVAH7TKIWLTP227UVA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZW2CI#issuecomment-502492425, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRLZRNSVCFMGHT7KJCSUTP227UVANCNFSM4HXW47MQ .

maximilianh commented 5 years ago

Sorry, but sure I understood: just to summarize: the current website places it on an _alt. This is not correct right ? It should place it on the main chrom and not show the alt at all?

On Mon 17 Jun 2019 at 03:05, Evgeni Frenkel notifications@github.com wrote:

Well, I recloned the git repository a few days ago, but in any case, running the sequence below on the CRISPOR website (just now) identifies it as being on chr17_KI270857v1_alt:1506687-1506837.

HGNC:84 exon names ENST00000614428.4_7 range=chr17:37270731-37270881 strand=-

ttttttctttctttttgaagGCAGCTGAGGAAGTTGGATATCCAGTAATGATCAAGGCCTCAGAGGGAGGAGGAGGGAAGGGAATTAGAAAAGTCAACAATGCAGATGACTTCCCTAATCTCTTCAGACAGgtagagtataagctgttttt

By blat, the sequence has two 100% full-length matches:

chr17_KI270857v1_alt - 1506687 1506837

chr17 - 37270731 37270881

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Sun, Jun 16, 2019 at 6:50 PM Maximilian Haeussler < notifications@github.com> wrote:

OK, I pushed the current code (which should have been pushed before, are you sure you did a git pull recently?)

You can look at the code, isAltChrom() is the relevant function, it's called in two places.

On Mon, Jun 17, 2019 at 12:08 AM Evgeni Frenkel < notifications@github.com> wrote:

PS: the same occurs for other portions of the ACACA gene, for example:

HGNC:84 exon names ENST00000614428.4_7 range=chr17:37270731-37270881 strand=-

ttttttctttctttttgaagGCAGCTGAGGAAGTTGGATATCCAGTAATGATCAAGGCCTCAGAGGGAGGAGGAGGGAAGGGAATTAGAAAAGTCAACAATGCAGATGACTTCCCTAATCTCTTCAGACAGgtagagtataagctgttttt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TPGAMO3QF6ADITI5C3P222VZA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZWGFQ#issuecomment-502489878

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AACL4TK5DHZYCQV6LIWTKJ3P222VZANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZWLSF7JJVAH7TKIWLTP227UVA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZW2CI#issuecomment-502492425 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AABRLZRNSVCFMGHT7KJCSUTP227UVANCNFSM4HXW47MQ

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TK456GJBQCS7NZYDVLP23WPDA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZ3CUQ#issuecomment-502509906, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TKJAFIO3AB6COAHSDLP23WPDANCNFSM4HXW47MQ .

genya commented 5 years ago

Right, when the input exactly matches the main chromosome and the alt, it should be assigned to main chromosome. Also, in the command line crispor, the guides for sequences that get assigned to alt have a blank in the column that indicates exon vs intergenic etc (can’t at the moment recall the header for this column) whereas when assigned to main chromosome, they have correct values in this column (eg exon:geneTargeted)

On Mon, Jun 17, 2019 at 4:00 AM Maximilian Haeussler < notifications@github.com> wrote:

Sorry, but sure I understood: just to summarize: the current website places it on an _alt. This is not correct right ? It should place it on the main chrom and not show the alt at all?

On Mon 17 Jun 2019 at 03:05, Evgeni Frenkel notifications@github.com wrote:

Well, I recloned the git repository a few days ago, but in any case, running the sequence below on the CRISPOR website (just now) identifies it as being on chr17_KI270857v1_alt:1506687-1506837.

HGNC:84 exon names ENST00000614428.4_7 range=chr17:37270731-37270881 strand=-

ttttttctttctttttgaagGCAGCTGAGGAAGTTGGATATCCAGTAATGATCAAGGCCTCAGAGGGAGGAGGAGGGAAGGGAATTAGAAAAGTCAACAATGCAGATGACTTCCCTAATCTCTTCAGACAGgtagagtataagctgttttt

By blat, the sequence has two 100% full-length matches:

chr17_KI270857v1_alt - 1506687 1506837

chr17 - 37270731 37270881

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Sun, Jun 16, 2019 at 6:50 PM Maximilian Haeussler < notifications@github.com> wrote:

OK, I pushed the current code (which should have been pushed before, are you sure you did a git pull recently?)

You can look at the code, isAltChrom() is the relevant function, it's called in two places.

On Mon, Jun 17, 2019 at 12:08 AM Evgeni Frenkel < notifications@github.com> wrote:

PS: the same occurs for other portions of the ACACA gene, for example:

HGNC:84 exon names ENST00000614428.4_7 range=chr17:37270731-37270881 strand=-

ttttttctttctttttgaagGCAGCTGAGGAAGTTGGATATCCAGTAATGATCAAGGCCTCAGAGGGAGGAGGAGGGAAGGGAATTAGAAAAGTCAACAATGCAGATGACTTCCCTAATCTCTTCAGACAGgtagagtataagctgttttt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TPGAMO3QF6ADITI5C3P222VZA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZWGFQ#issuecomment-502489878

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AACL4TK5DHZYCQV6LIWTKJ3P222VZANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZWLSF7JJVAH7TKIWLTP227UVA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZW2CI#issuecomment-502492425

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZRNSVCFMGHT7KJCSUTP227UVANCNFSM4HXW47MQ

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TK456GJBQCS7NZYDVLP23WPDA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZ3CUQ#issuecomment-502509906 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AACL4TKJAFIO3AB6COAHSDLP23WPDANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZWP4YIQOEFRRCGJUH3P25ADPA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX2LNOQ#issuecomment-502576826, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRLZW7FPXX6R3QYVZWYSTP25ADPANCNFSM4HXW47MQ .

-- -- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

maximilianh commented 5 years ago

Can you confirm that you did a git pull today ?

On Mon 17 Jun 2019 at 13:24, Evgeni Frenkel notifications@github.com wrote:

Right, when the input exactly matches the main chromosome and the alt, it should be assigned to main chromosome. Also, in the command line crispor, the guides for sequences that get assigned to alt have a blank in the column that indicates exon vs intergenic etc (can’t at the moment recall the header for this column) whereas when assigned to main chromosome, they have correct values in this column (eg exon:geneTargeted)

On Mon, Jun 17, 2019 at 4:00 AM Maximilian Haeussler < notifications@github.com> wrote:

Sorry, but sure I understood: just to summarize: the current website places it on an _alt. This is not correct right ? It should place it on the main chrom and not show the alt at all?

On Mon 17 Jun 2019 at 03:05, Evgeni Frenkel notifications@github.com wrote:

Well, I recloned the git repository a few days ago, but in any case, running the sequence below on the CRISPOR website (just now) identifies it as being on chr17_KI270857v1_alt:1506687-1506837.

HGNC:84 exon names ENST00000614428.4_7 range=chr17:37270731-37270881 strand=-

ttttttctttctttttgaagGCAGCTGAGGAAGTTGGATATCCAGTAATGATCAAGGCCTCAGAGGGAGGAGGAGGGAAGGGAATTAGAAAAGTCAACAATGCAGATGACTTCCCTAATCTCTTCAGACAGgtagagtataagctgttttt

By blat, the sequence has two 100% full-length matches:

chr17_KI270857v1_alt - 1506687 1506837

chr17 - 37270731 37270881

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Sun, Jun 16, 2019 at 6:50 PM Maximilian Haeussler < notifications@github.com> wrote:

OK, I pushed the current code (which should have been pushed before, are you sure you did a git pull recently?)

You can look at the code, isAltChrom() is the relevant function, it's called in two places.

On Mon, Jun 17, 2019 at 12:08 AM Evgeni Frenkel < notifications@github.com> wrote:

PS: the same occurs for other portions of the ACACA gene, for example:

HGNC:84 exon names ENST00000614428.4_7 range=chr17:37270731-37270881 strand=-

ttttttctttctttttgaagGCAGCTGAGGAAGTTGGATATCCAGTAATGATCAAGGCCTCAGAGGGAGGAGGAGGGAAGGGAATTAGAAAAGTCAACAATGCAGATGACTTCCCTAATCTCTTCAGACAGgtagagtataagctgttttt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TPGAMO3QF6ADITI5C3P222VZA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZWGFQ#issuecomment-502489878

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AACL4TK5DHZYCQV6LIWTKJ3P222VZANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZWLSF7JJVAH7TKIWLTP227UVA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZW2CI#issuecomment-502492425

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZRNSVCFMGHT7KJCSUTP227UVANCNFSM4HXW47MQ

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TK456GJBQCS7NZYDVLP23WPDA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZ3CUQ#issuecomment-502509906

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AACL4TKJAFIO3AB6COAHSDLP23WPDANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZWP4YIQOEFRRCGJUH3P25ADPA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX2LNOQ#issuecomment-502576826 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AABRLZW7FPXX6R3QYVZWYSTP25ADPANCNFSM4HXW47MQ

.

--

Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TM5EHBXNWO7RMLSBB3P25665A5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX275EY#issuecomment-502660755, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TOWBKOT3IRSGZOM3ALP25665ANCNFSM4HXW47MQ .

genya commented 5 years ago

Hi Max,

Yes, I just repeated it after updating from the repository and got the same result, likewise the CRISPOR website. Here's an example input that still produces the error:

gtttgtttatgcctttccagATTGGCATGGTAGCTTGGAAAATGACCTTTAAAAGTCCTGAATATCCAGAAGGCCGAGATATCATTGTTATTGGCAATGACATCACATACCGAATTGGGTCCTTTGGGCCTCAAGAGGATTTGTTATTTCTCAGAGCTTCCGAACTTGCTAGGGCAGAAGGTATTCCACGCATCTATGTATCAGCCAACAGTGGAGCAAGAATCGGACTGGCAGAAGAAATTCGCCATATGTTTCATGTGGCCTGGGTAGATCCTGAGGATCCTTACAAGgtacacactaagagcatata

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Mon, Jun 17, 2019 at 8:30 AM Maximilian Haeussler < notifications@github.com> wrote:

Can you confirm that you did a git pull today ?

On Mon 17 Jun 2019 at 13:24, Evgeni Frenkel notifications@github.com wrote:

Right, when the input exactly matches the main chromosome and the alt, it should be assigned to main chromosome. Also, in the command line crispor, the guides for sequences that get assigned to alt have a blank in the column that indicates exon vs intergenic etc (can’t at the moment recall the header for this column) whereas when assigned to main chromosome, they have correct values in this column (eg exon:geneTargeted)

On Mon, Jun 17, 2019 at 4:00 AM Maximilian Haeussler < notifications@github.com> wrote:

Sorry, but sure I understood: just to summarize: the current website places it on an _alt. This is not correct right ? It should place it on the main chrom and not show the alt at all?

On Mon 17 Jun 2019 at 03:05, Evgeni Frenkel notifications@github.com wrote:

Well, I recloned the git repository a few days ago, but in any case, running the sequence below on the CRISPOR website (just now) identifies it as being on chr17_KI270857v1_alt:1506687-1506837.

HGNC:84 exon names ENST00000614428.4_7 range=chr17:37270731-37270881 strand=-

ttttttctttctttttgaagGCAGCTGAGGAAGTTGGATATCCAGTAATGATCAAGGCCTCAGAGGGAGGAGGAGGGAAGGGAATTAGAAAAGTCAACAATGCAGATGACTTCCCTAATCTCTTCAGACAGgtagagtataagctgttttt

By blat, the sequence has two 100% full-length matches:

chr17_KI270857v1_alt - 1506687 1506837

chr17 - 37270731 37270881

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Sun, Jun 16, 2019 at 6:50 PM Maximilian Haeussler < notifications@github.com> wrote:

OK, I pushed the current code (which should have been pushed before, are you sure you did a git pull recently?)

You can look at the code, isAltChrom() is the relevant function, it's called in two places.

On Mon, Jun 17, 2019 at 12:08 AM Evgeni Frenkel < notifications@github.com> wrote:

PS: the same occurs for other portions of the ACACA gene, for example:

HGNC:84 exon names ENST00000614428.4_7 range=chr17:37270731-37270881 strand=-

ttttttctttctttttgaagGCAGCTGAGGAAGTTGGATATCCAGTAATGATCAAGGCCTCAGAGGGAGGAGGAGGGAAGGGAATTAGAAAAGTCAACAATGCAGATGACTTCCCTAATCTCTTCAGACAGgtagagtataagctgttttt

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TPGAMO3QF6ADITI5C3P222VZA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZWGFQ#issuecomment-502489878

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AACL4TK5DHZYCQV6LIWTKJ3P222VZANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZWLSF7JJVAH7TKIWLTP227UVA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZW2CI#issuecomment-502492425

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZRNSVCFMGHT7KJCSUTP227UVANCNFSM4HXW47MQ

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TK456GJBQCS7NZYDVLP23WPDA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXZ3CUQ#issuecomment-502509906

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AACL4TKJAFIO3AB6COAHSDLP23WPDANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZWP4YIQOEFRRCGJUH3P25ADPA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX2LNOQ#issuecomment-502576826

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZW7FPXX6R3QYVZWYSTP25ADPANCNFSM4HXW47MQ

.

--

Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TM5EHBXNWO7RMLSBB3P25665A5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX275EY#issuecomment-502660755 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AACL4TOWBKOT3IRSGZOM3ALP25665ANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZTODCND6EWBX4UDS7DP257WLA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX3ANHQ#issuecomment-502662814, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRLZX7X3QT5KQRSNK3X2LP257WLANCNFSM4HXW47MQ .

maximilianh commented 5 years ago

crispor will prefer the main chromosome if it gets one from BWA.

I can confirm that BWA places this input sequence ONLY on the _alt sequence, and not on the main chromosome at all. With most other similar sequences, it will find both locations but not for this one. This looks like a bug" in bwasw. I have no idea what to do about it... any ideas what we could do? Do you want to play with the input sequence and see if bwasw will find the correct secondary match after a certain length maybe?

Well, we could switch to another aligner for the "find the best match" part, but good luck finding an aligner that is as fast as BWA and that reliably finds the best match... BLAT would take a while to start up and load the index. bowtie maybe?

Actually, "bwa aln" could be used for this, but it's not trivial to get this to work with good performance for longer sequence...

Another solution would be to remove all _alts from hg38. I don't think they're useful for crispr work anyways.

genya commented 5 years ago

I think best solution may be to offer an option to drop the alts from the reference genome (can they be picked up as spurious offtargets?).

Bwa should then find the main chromosome alignment because its current behavior is consistent with its documentation, which says that it is not guaranteed to find all alignments:

http://bio-bwa.sourceforge.net/bwa.shtml “BWA does not guarantee to find all local hits as what BWT-SW is designed to do, but it is much faster than BWT-SW on both short and long query sequences.”

I don’t know if bwtsw is prohibitively slower.

On Mon, Jun 17, 2019 at 5:42 PM Maximilian Haeussler < notifications@github.com> wrote:

crispor will prefer the main chromosome if it gets one from BWA.

I can confirm that BWA places this input sequence ONLY on the _alt sequence, and not on the main chromosome at all. With most other similar sequences, it will find both locations but not for this one. This looks like a bug" in bwasw. I have no idea what to do about it... any ideas what we could do? Do you want to play with the input sequence and see if bwasw will find the correct secondary match after a certain length maybe?

Well, we could switch to another aligner for the "find the best match" part, but good luck finding an aligner that is as fast as BWA and that reliably finds the best match... BLAT would take a while to start up and load the index. bowtie maybe?

Actually, "bwa aln" could be used for this, but it's not trivial to get this to work with good performance for longer sequence...

Another solution would be to remove all _alts from hg38. I don't think they're useful for crispr work anyways.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZTVY4Y6WJRB5C6RJGDP3AANVA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX4RC7Y#issuecomment-502862207, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRLZWAUGEB4242IXKHSPTP3AANVANCNFSM4HXW47MQ .

-- -- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

maximilianh commented 5 years ago

Well, that’s the thing: I’m using BWA-SW here. It’s definitely not finding the secondary match, you can run it with -d which will keep all temp files and you can check the .sam file yourself. I don’t understand how something like this is possible.... maybe it’s a feature if bwa sw just not documented ?

On Tue 18 Jun 2019 at 03:34, Evgeni Frenkel notifications@github.com wrote:

I think best solution may be to offer an option to drop the alts from the reference genome (can they be picked up as spurious offtargets?).

Bwa should then find the main chromosome alignment because its current behavior is consistent with its documentation, which says that it is not guaranteed to find all alignments:

http://bio-bwa.sourceforge.net/bwa.shtml “BWA does not guarantee to find all local hits as what BWT-SW is designed to do, but it is much faster than BWT-SW on both short and long query sequences.”

I don’t know if bwtsw is prohibitively slower.

On Mon, Jun 17, 2019 at 5:42 PM Maximilian Haeussler < notifications@github.com> wrote:

crispor will prefer the main chromosome if it gets one from BWA.

I can confirm that BWA places this input sequence ONLY on the _alt sequence, and not on the main chromosome at all. With most other similar sequences, it will find both locations but not for this one. This looks like a bug" in bwasw. I have no idea what to do about it... any ideas what we could do? Do you want to play with the input sequence and see if bwasw will find the correct secondary match after a certain length maybe?

Well, we could switch to another aligner for the "find the best match" part, but good luck finding an aligner that is as fast as BWA and that reliably finds the best match... BLAT would take a while to start up and load the index. bowtie maybe?

Actually, "bwa aln" could be used for this, but it's not trivial to get this to work with good performance for longer sequence...

Another solution would be to remove all _alts from hg38. I don't think they're useful for crispr work anyways.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZTVY4Y6WJRB5C6RJGDP3AANVA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX4RC7Y#issuecomment-502862207 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AABRLZWAUGEB4242IXKHSPTP3AANVANCNFSM4HXW47MQ

.

--

Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TLSBJYHKOPICPUT2RTP3BCVBA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX47UDI#issuecomment-502921741, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TNRH3KEDUIFLGFP3GDP3BCVBANCNFSM4HXW47MQ .

genya commented 5 years ago

I think the best solution would be to split the genome into main chromosomes and the alts. If no match found on main chromosomes, then the code tries the alts. That way only BWA-SW is used and it may even be faster since most of the time the alts do not need to be searched. I'll try this out in the next day or two. From the BWA documentation it seems BWA-SW is guaranteed to find a match if it exists but is not guaranteed to find all matches.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Tue, Jun 18, 2019 at 4:23 AM Maximilian Haeussler < notifications@github.com> wrote:

Well, that’s the thing: I’m using BWA-SW here. It’s definitely not finding the secondary match, you can run it with -d which will keep all temp files and you can check the .sam file yourself. I don’t understand how something like this is possible.... maybe it’s a feature if bwa sw just not documented ?

On Tue 18 Jun 2019 at 03:34, Evgeni Frenkel notifications@github.com wrote:

I think best solution may be to offer an option to drop the alts from the reference genome (can they be picked up as spurious offtargets?).

Bwa should then find the main chromosome alignment because its current behavior is consistent with its documentation, which says that it is not guaranteed to find all alignments:

http://bio-bwa.sourceforge.net/bwa.shtml “BWA does not guarantee to find all local hits as what BWT-SW is designed to do, but it is much faster than BWT-SW on both short and long query sequences.”

I don’t know if bwtsw is prohibitively slower.

On Mon, Jun 17, 2019 at 5:42 PM Maximilian Haeussler < notifications@github.com> wrote:

crispor will prefer the main chromosome if it gets one from BWA.

I can confirm that BWA places this input sequence ONLY on the _alt sequence, and not on the main chromosome at all. With most other similar sequences, it will find both locations but not for this one. This looks like a bug" in bwasw. I have no idea what to do about it... any ideas what we could do? Do you want to play with the input sequence and see if bwasw will find the correct secondary match after a certain length maybe?

Well, we could switch to another aligner for the "find the best match" part, but good luck finding an aligner that is as fast as BWA and that reliably finds the best match... BLAT would take a while to start up and load the index. bowtie maybe?

Actually, "bwa aln" could be used for this, but it's not trivial to get this to work with good performance for longer sequence...

Another solution would be to remove all _alts from hg38. I don't think they're useful for crispr work anyways.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZTVY4Y6WJRB5C6RJGDP3AANVA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX4RC7Y#issuecomment-502862207

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZWAUGEB4242IXKHSPTP3AANVANCNFSM4HXW47MQ

.

--

Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TLSBJYHKOPICPUT2RTP3BCVBA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX47UDI#issuecomment-502921741 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AACL4TNRH3KEDUIFLGFP3GDP3BCVBANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZQVNZU2NOLPQV2RPWDP3CLRZA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX5TCDY#issuecomment-503001359, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRLZWYWBFIPEV4Q2W63PTP3CLRZANCNFSM4HXW47MQ .

maximilianh commented 5 years ago

OK so this would be a one-off just for hg38: have two genome files, one main, one alt and run bwasw TWICE (which I have never done before). That's a great idea. I somehow never thought of it.

genya commented 5 years ago

ok glad to hear that! in your current code, since nonAltMatches are favored, it would make no difference to run bwasw first only on the main chromosomes and then run it again only if no match found.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Tue, Jun 18, 2019 at 5:38 AM Maximilian Haeussler < notifications@github.com> wrote:

OK so this would be a one-off just for hg38: have two genome files, one main, one alt and run bwasw TWICE (which I have never done before). That's a great idea. I somehow never thought of it.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZSIRN524HJMZJ242XLP3CULBA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX5Z2RQ#issuecomment-503029062, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRLZS2WTTNXW3OSCTMZVDP3CULBANCNFSM4HXW47MQ .

maximilianh commented 5 years ago

Hm. Sorry I don't understand. My idea was to split the genome into the mains and the alts (a very small file), and always run on both to find the real target. For the off-target search I think we're fine, BWA ALN will always find all best matches.

genya commented 5 years ago

My understanding is that this part of crispor.py

nonAltMatches = [x for x in matches if not isAltChrom(x[0])] if len(nonAltMatches)!=0: bestMatch = nonAltMatches[0] else: bestMatch = matches[0]

rejects the alt matches if a main match is found, so I was suggesting that it would then be equivalent to first run bwasw on the mains and only if no match found, run bwasw on the alts. I didn't know the alts were so small so I thought it might make a difference in performance.

About off-targets: If bwa-aln finds all the matches, couldn't it pick up sequences that are duplicated between the mains and the alts? So in the example sequence, which matches both chr17_KI270857v1_alt and chr17, if the chr17 alignment is treated as the target, could the code consider the matching chr17_KI270857v1_alt sequence as off-target, even though they're just different versions of the same portion of the genome? This seems to be happening for some of the guides in command-line crispor, although don't see it in the website crispor outputs.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Tue, Jun 18, 2019 at 7:13 AM Maximilian Haeussler < notifications@github.com> wrote:

Hm. Sorry I don't understand. My idea was to split the genome into the mains and the alts (a very small file), and always run on both to find the real target. For the off-target search I think we're fine, BWA ALN will always find all best matches.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZRT5OED6EZAAK74GQDP3C7MZA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX6BG5Q#issuecomment-503059318, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRLZVNAXPIHXFTKQ4UZXDP3C7MZANCNFSM4HXW47MQ .

maximilianh commented 5 years ago

I wonder if the alts are relevant at all:

a) for off-targets, alts are simply ignored right now.

b) for the on-target sequence, in theory a user could paste a sequence on an alt that has no match on the main chromosome. I think this is a very unlikely case.

Instead of adding weird paths to the code, I wonder if I shouldn't simply remove the alts from hg38. It can't find a case where not having them really makes any difference... do you?

On Tue, Jun 18, 2019 at 8:04 PM Evgeni Frenkel notifications@github.com wrote:

My understanding is that this part of crispor.py

nonAltMatches = [x for x in matches if not isAltChrom(x[0])] if len(nonAltMatches)!=0: bestMatch = nonAltMatches[0] else: bestMatch = matches[0]

rejects the alt matches if a main match is found, so I was suggesting that it would then be equivalent to first run bwasw on the mains and only if no match found, run bwasw on the alts. I didn't know the alts were so small so I thought it might make a difference in performance.

About off-targets: If bwa-aln finds all the matches, couldn't it pick up sequences that are duplicated between the mains and the alts? So in the example sequence, which matches both chr17_KI270857v1_alt and chr17, if the chr17 alignment is treated as the target, could the code consider the matching chr17_KI270857v1_alt sequence as off-target, even though they're just different versions of the same portion of the genome? This seems to be happening for some of the guides in command-line crispor, although don't see it in the website crispor outputs.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Tue, Jun 18, 2019 at 7:13 AM Maximilian Haeussler < notifications@github.com> wrote:

Hm. Sorry I don't understand. My idea was to split the genome into the mains and the alts (a very small file), and always run on both to find the real target. For the off-target search I think we're fine, BWA ALN will always find all best matches.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZRT5OED6EZAAK74GQDP3C7MZA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX6BG5Q#issuecomment-503059318 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AABRLZVNAXPIHXFTKQ4UZXDP3C7MZANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TLM3B4LS4KXB3YRJFLP3EWTHA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX7USIA#issuecomment-503269664, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TJQVQ5JCGBXHN7ILIDP3EWTHANCNFSM4HXW47MQ .

genya commented 5 years ago

I'm not sure what the significance of these alternative sequences are, just skimmed the documentation (link below) and found it quite confusing, but assuming they're just another version of segments of the main chromosome assemblies, then see no reason to favor or consider them. But I don't know if that's really true. In any case, I was planning to ignore them because I wasn't really aware of them until seeing exon sequences pulled from the main chromosome assemblies unexpectedly aligning to the alts.

https://www.ncbi.nlm.nih.gov/grc/help/definitions/ Alternate locus:A sequence that provides an alternate representation of a locus found in a largely haploid assembly. These sequences don't represent a complete chromosome sequence although there is no hard limit on the size of the alternate locus; currently these are less than 1 Mb. Previously these sequences have been referred to as "partial chromosomes", "alternate alleles", and "alternate haplotypes". However, these terms are confusing because they contain terms that have biological implications. Diploid assemblies (which by definition are from a single individual) should not have alternate loci representations. Multiple scaffolds from different loci that are considered to be part of the same haplotype should be grouped into alternate locus groups (e.g. mouse 129/Sv group). Note: an alternate locus group was previously considered an alternate partial assembly.

https://www.ncbi.nlm.nih.gov/grc/help/faq/#difference-between-alternate-loci-and-novel-patch https://www.ncbi.nlm.nih.gov/grc/help/faq/#difference-between-alternate-loci-and-novel-patch

What are alternate loci and novel patches?

Alternate loci and novel patches enable the reference assembly to represent allelic diversity. They are scaffold sequences that are given chromosome context through alignments to the corresponding chromosome regions. Alternate loci scaffolds and their alignments are included in major assembly releases, while novel patch scaffolds and their alignments are included in subsequent patch releases for that assembly. They can be considered functionally equivalent, as novel patches will be reassigned to the role of alternate loci scaffolds at the time of the next major assembly release. Assembly regions for which the GRC provides alternate loci or novel patch scaffolds are typically those with known alternate haplotypes (e.g. immune-associated regions), highly variable genomic regions (e.g. olfactory receptor regions) or those where there are structural variants having 5 Kb or more sequence not represented on the chromosome. Human alternate loci and all novel patch scaffolds also include one or more anchor sequence components to ensure their robust alignment to the chromosomes. Anchor sequences are component(s) that are also found in the corresponding chromosome. The sequence locations corresponding to anchor components are annotated on the GenBank records for all alternate loci and patch scaffolds. For more detail on patches please see Introductions to Patches https://www.ncbi.nlm.nih.gov/grc/help/patches.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 12:05 PM Maximilian Haeussler < notifications@github.com> wrote:

I wonder if the alts are relevant at all:

a) for off-targets, alts are simply ignored right now.

b) for the on-target sequence, in theory a user could paste a sequence on an alt that has no match on the main chromosome. I think this is a very unlikely case.

Instead of adding weird paths to the code, I wonder if I shouldn't simply remove the alts from hg38. It can't find a case where not having them really makes any difference... do you?

On Tue, Jun 18, 2019 at 8:04 PM Evgeni Frenkel notifications@github.com wrote:

My understanding is that this part of crispor.py

nonAltMatches = [x for x in matches if not isAltChrom(x[0])] if len(nonAltMatches)!=0: bestMatch = nonAltMatches[0] else: bestMatch = matches[0]

rejects the alt matches if a main match is found, so I was suggesting that it would then be equivalent to first run bwasw on the mains and only if no match found, run bwasw on the alts. I didn't know the alts were so small so I thought it might make a difference in performance.

About off-targets: If bwa-aln finds all the matches, couldn't it pick up sequences that are duplicated between the mains and the alts? So in the example sequence, which matches both chr17_KI270857v1_alt and chr17, if the chr17 alignment is treated as the target, could the code consider the matching chr17_KI270857v1_alt sequence as off-target, even though they're just different versions of the same portion of the genome? This seems to be happening for some of the guides in command-line crispor, although don't see it in the website crispor outputs.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Tue, Jun 18, 2019 at 7:13 AM Maximilian Haeussler < notifications@github.com> wrote:

Hm. Sorry I don't understand. My idea was to split the genome into the mains and the alts (a very small file), and always run on both to find the real target. For the off-target search I think we're fine, BWA ALN will always find all best matches.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZRT5OED6EZAAK74GQDP3C7MZA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX6BG5Q#issuecomment-503059318

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZVNAXPIHXFTKQ4UZXDP3C7MZANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TLM3B4LS4KXB3YRJFLP3EWTHA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODX7USIA#issuecomment-503269664 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AACL4TJQVQ5JCGBXHN7ILIDP3EWTHANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZU62A3RIUDOJFWS3ZDP3JKN3A5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCLNMA#issuecomment-503625392, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRLZUUYSGJARWQ7RCS76DP3JKN3ANCNFSM4HXW47MQ .

maximilianh commented 5 years ago

Yes, these alts are a huge distraction for everyone and break many pipelines. I'll remove them.

genya commented 5 years ago

Sounds great, thank you! I'll rerun my pipeline after the update.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 12:32 PM Maximilian Haeussler < notifications@github.com> wrote:

Yes, these alts are a huge distraction for everyone and break many pipelines. I'll remove them.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZTGKGHXHIYD2G5C3U3P3JNQ7A5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCN2DI#issuecomment-503635213, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRLZTU35L2AXV3H7MREI3P3JNQ7ANCNFSM4HXW47MQ .

maximilianh commented 5 years ago

You don't even have to wait, you can simply use a fasta filter tool and remove all the alts. Or maybe NCBI or UCSC provides a genome without any alts?

On Wed, Jun 19, 2019 at 5:39 PM Evgeni Frenkel notifications@github.com wrote:

Sounds great, thank you! I'll rerun my pipeline after the update.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 12:32 PM Maximilian Haeussler < notifications@github.com> wrote:

Yes, these alts are a huge distraction for everyone and break many pipelines. I'll remove them.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZTGKGHXHIYD2G5C3U3P3JNQ7A5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCN2DI#issuecomment-503635213 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AABRLZTU35L2AXV3H7MREI3P3JNQ7ANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TLFEC25PICGE2WSP3LP3JOLBA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCON5I#issuecomment-503637749, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TMOMWD6QRA7KUV54HLP3JOLBANCNFSM4HXW47MQ .

genya commented 5 years ago

I tried to do as you suggest with the following:

./crisprAddGenome fasta hg38_noAlt.fa --desc 'hg38|Homo sapiens|human|hg38.96' --gff Homo_sapiens.GRCh38.96.gff3

where hg38_noAlt.fa is the human genome fasta file with the non-main chromosomes removed, and got the following error:

File "./crisprAddGenome", line 77 print msg ^ SyntaxError: Missing parentheses in call to 'print'. Did you mean print(msg)?

I can't run as sudo, don't have root privileges, but gffread and twoBitToFa are installed. Do you know what I'm doing wrong? Please let me know.

Thank you! Genya

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 12:42 PM Maximilian Haeussler < notifications@github.com> wrote:

You don't even have to wait, you can simply use a fasta filter tool and remove all the alts. Or maybe NCBI or UCSC provides a genome without any alts?

On Wed, Jun 19, 2019 at 5:39 PM Evgeni Frenkel notifications@github.com wrote:

Sounds great, thank you! I'll rerun my pipeline after the update.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 12:32 PM Maximilian Haeussler < notifications@github.com> wrote:

Yes, these alts are a huge distraction for everyone and break many pipelines. I'll remove them.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZTGKGHXHIYD2G5C3U3P3JNQ7A5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCN2DI#issuecomment-503635213

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZTU35L2AXV3H7MREI3P3JNQ7ANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TLFEC25PICGE2WSP3LP3JOLBA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCON5I#issuecomment-503637749 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AACL4TMOMWD6QRA7KUV54HLP3JOLBANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZRF5CUMM6ALYBM6UU3P3JOYHA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCOXLI#issuecomment-503638957, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRLZT2VATYTVF3WHTSJXLP3JOYHANCNFSM4HXW47MQ .

maximilianh commented 5 years ago

Hm, yes, sorry, this is a python2.7 script. In of many of today's linuxes, python 3.6 is the default and because Guide van Rossum is a very stubborn guy, he changed the most essential statement, print, to be incompatible. I can fix this one quickly.

On Wed, Jun 19, 2019 at 8:32 PM Evgeni Frenkel notifications@github.com wrote:

I tried to do as you suggest with the following:

./crisprAddGenome fasta hg38_noAlt.fa --desc 'hg38|Homo sapiens|human|hg38.96' --gff Homo_sapiens.GRCh38.96.gff3

where hg38_noAlt.fa is the human genome fasta file with the non-main chromosomes removed, and got the following error:

File "./crisprAddGenome", line 77 print msg ^ SyntaxError: Missing parentheses in call to 'print'. Did you mean print(msg)?

I can't run as sudo, don't have root privileges, but gffread and twoBitToFa are installed. Do you know what I'm doing wrong? Please let me know.

Thank you! Genya

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 12:42 PM Maximilian Haeussler < notifications@github.com> wrote:

You don't even have to wait, you can simply use a fasta filter tool and remove all the alts. Or maybe NCBI or UCSC provides a genome without any alts?

On Wed, Jun 19, 2019 at 5:39 PM Evgeni Frenkel <notifications@github.com

wrote:

Sounds great, thank you! I'll rerun my pipeline after the update.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 12:32 PM Maximilian Haeussler < notifications@github.com> wrote:

Yes, these alts are a huge distraction for everyone and break many pipelines. I'll remove them.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZTGKGHXHIYD2G5C3U3P3JNQ7A5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCN2DI#issuecomment-503635213

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZTU35L2AXV3H7MREI3P3JNQ7ANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TLFEC25PICGE2WSP3LP3JOLBA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCON5I#issuecomment-503637749

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AACL4TMOMWD6QRA7KUV54HLP3JOLBANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZRF5CUMM6ALYBM6UU3P3JOYHA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCOXLI#issuecomment-503638957 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AABRLZT2VATYTVF3WHTSJXLP3JOYHANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TNUMII3ORQD7Y3TJTTP3KCWLA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYDBQPI#issuecomment-503715901, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TJZLSEUK3OFRY3MBD3P3KCWLANCNFSM4HXW47MQ .

genya commented 5 years ago

I tried running in python2 virtual environment and got error indicating that our server does not have mysql. Our IT people should be able to fix that...

(crispor)$ ./crisprAddGenome fasta hg38_noAlt.fa --desc 'hg38|Homo sapiens|human|hg38.96' --gff Homo_sapiens.GRCh38.96.gff3 h38.96.gff3 Traceback (most recent call last): File "./crisprAddGenome", line 7, in import MySQLdb # install with "sudo apt-get install python-mysqldb" ImportError: No module named MySQLdb

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 5:58 PM Maximilian Haeussler < notifications@github.com> wrote:

Hm, yes, sorry, this is a python2.7 script. In of many of today's linuxes, python 3.6 is the default and because Guide van Rossum is a very stubborn guy, he changed the most essential statement, print, to be incompatible. I can fix this one quickly.

On Wed, Jun 19, 2019 at 8:32 PM Evgeni Frenkel notifications@github.com wrote:

I tried to do as you suggest with the following:

./crisprAddGenome fasta hg38_noAlt.fa --desc 'hg38|Homo sapiens|human|hg38.96' --gff Homo_sapiens.GRCh38.96.gff3

where hg38_noAlt.fa is the human genome fasta file with the non-main chromosomes removed, and got the following error:

File "./crisprAddGenome", line 77 print msg ^ SyntaxError: Missing parentheses in call to 'print'. Did you mean print(msg)?

I can't run as sudo, don't have root privileges, but gffread and twoBitToFa are installed. Do you know what I'm doing wrong? Please let me know.

Thank you! Genya

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 12:42 PM Maximilian Haeussler < notifications@github.com> wrote:

You don't even have to wait, you can simply use a fasta filter tool and remove all the alts. Or maybe NCBI or UCSC provides a genome without any alts?

On Wed, Jun 19, 2019 at 5:39 PM Evgeni Frenkel < notifications@github.com

wrote:

Sounds great, thank you! I'll rerun my pipeline after the update.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 12:32 PM Maximilian Haeussler < notifications@github.com> wrote:

Yes, these alts are a huge distraction for everyone and break many pipelines. I'll remove them.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZTGKGHXHIYD2G5C3U3P3JNQ7A5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCN2DI#issuecomment-503635213

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZTU35L2AXV3H7MREI3P3JNQ7ANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TLFEC25PICGE2WSP3LP3JOLBA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCON5I#issuecomment-503637749

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AACL4TMOMWD6QRA7KUV54HLP3JOLBANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZRF5CUMM6ALYBM6UU3P3JOYHA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCOXLI#issuecomment-503638957

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZT2VATYTVF3WHTSJXLP3JOYHANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TNUMII3ORQD7Y3TJTTP3KCWLA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYDBQPI#issuecomment-503715901 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AACL4TJZLSEUK3OFRY3MBD3P3KCWLANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZT44WPLGO2FDXKML7DP3KTYXA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYDM2XI#issuecomment-503762269, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRLZWCAA3M3D4WLWTWFTDP3KTYXANCNFSM4HXW47MQ .

maximilianh commented 5 years ago

I think I got the script ocnverted to py3... one sec...

On Wed, Jun 19, 2019 at 11:07 PM Evgeni Frenkel notifications@github.com wrote:

I tried running in python2 virtual environment and got error indicating that our server does not have mysql. Our IT people should be able to fix that...

(crispor)$ ./crisprAddGenome fasta hg38_noAlt.fa --desc 'hg38|Homo sapiens|human|hg38.96' --gff Homo_sapiens.GRCh38.96.gff3 h38.96.gff3 Traceback (most recent call last): File "./crisprAddGenome", line 7, in import MySQLdb # install with "sudo apt-get install python-mysqldb" ImportError: No module named MySQLdb

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 5:58 PM Maximilian Haeussler < notifications@github.com> wrote:

Hm, yes, sorry, this is a python2.7 script. In of many of today's linuxes, python 3.6 is the default and because Guide van Rossum is a very stubborn guy, he changed the most essential statement, print, to be incompatible. I can fix this one quickly.

On Wed, Jun 19, 2019 at 8:32 PM Evgeni Frenkel <notifications@github.com

wrote:

I tried to do as you suggest with the following:

./crisprAddGenome fasta hg38_noAlt.fa --desc 'hg38|Homo sapiens|human|hg38.96' --gff Homo_sapiens.GRCh38.96.gff3

where hg38_noAlt.fa is the human genome fasta file with the non-main chromosomes removed, and got the following error:

File "./crisprAddGenome", line 77 print msg ^ SyntaxError: Missing parentheses in call to 'print'. Did you mean print(msg)?

I can't run as sudo, don't have root privileges, but gffread and twoBitToFa are installed. Do you know what I'm doing wrong? Please let me know.

Thank you! Genya

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 12:42 PM Maximilian Haeussler < notifications@github.com> wrote:

You don't even have to wait, you can simply use a fasta filter tool and remove all the alts. Or maybe NCBI or UCSC provides a genome without any alts?

On Wed, Jun 19, 2019 at 5:39 PM Evgeni Frenkel < notifications@github.com

wrote:

Sounds great, thank you! I'll rerun my pipeline after the update.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 12:32 PM Maximilian Haeussler < notifications@github.com> wrote:

Yes, these alts are a huge distraction for everyone and break many pipelines. I'll remove them.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZTGKGHXHIYD2G5C3U3P3JNQ7A5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCN2DI#issuecomment-503635213

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZTU35L2AXV3H7MREI3P3JNQ7ANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TLFEC25PICGE2WSP3LP3JOLBA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCON5I#issuecomment-503637749

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AACL4TMOMWD6QRA7KUV54HLP3JOLBANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZRF5CUMM6ALYBM6UU3P3JOYHA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCOXLI#issuecomment-503638957

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZT2VATYTVF3WHTSJXLP3JOYHANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TNUMII3ORQD7Y3TJTTP3KCWLA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYDBQPI#issuecomment-503715901

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AACL4TJZLSEUK3OFRY3MBD3P3KCWLANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZT44WPLGO2FDXKML7DP3KTYXA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYDM2XI#issuecomment-503762269 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AABRLZWCAA3M3D4WLWTWFTDP3KTYXANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TPYSRIIL3AG4U4XYELP3KU2RA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYDNNSA#issuecomment-503764680, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TLQA6DCSGMFRCAVRRDP3KU2RANCNFSM4HXW47MQ .

maximilianh commented 5 years ago

ok can you do a git pull and retry? It should work with py3 now. At least I'm running it right now on python3.

On Wed, Jun 19, 2019 at 11:48 PM Maximilian Haeussler maximilianh@gmail.com wrote:

I think I got the script ocnverted to py3... one sec...

On Wed, Jun 19, 2019 at 11:07 PM Evgeni Frenkel notifications@github.com wrote:

I tried running in python2 virtual environment and got error indicating that our server does not have mysql. Our IT people should be able to fix that...

(crispor)$ ./crisprAddGenome fasta hg38_noAlt.fa --desc 'hg38|Homo sapiens|human|hg38.96' --gff Homo_sapiens.GRCh38.96.gff3 h38.96.gff3 Traceback (most recent call last): File "./crisprAddGenome", line 7, in import MySQLdb # install with "sudo apt-get install python-mysqldb" ImportError: No module named MySQLdb

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 5:58 PM Maximilian Haeussler < notifications@github.com> wrote:

Hm, yes, sorry, this is a python2.7 script. In of many of today's linuxes, python 3.6 is the default and because Guide van Rossum is a very stubborn guy, he changed the most essential statement, print, to be incompatible. I can fix this one quickly.

On Wed, Jun 19, 2019 at 8:32 PM Evgeni Frenkel < notifications@github.com> wrote:

I tried to do as you suggest with the following:

./crisprAddGenome fasta hg38_noAlt.fa --desc 'hg38|Homo sapiens|human|hg38.96' --gff Homo_sapiens.GRCh38.96.gff3

where hg38_noAlt.fa is the human genome fasta file with the non-main chromosomes removed, and got the following error:

File "./crisprAddGenome", line 77 print msg ^ SyntaxError: Missing parentheses in call to 'print'. Did you mean print(msg)?

I can't run as sudo, don't have root privileges, but gffread and twoBitToFa are installed. Do you know what I'm doing wrong? Please let me know.

Thank you! Genya

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 12:42 PM Maximilian Haeussler < notifications@github.com> wrote:

You don't even have to wait, you can simply use a fasta filter tool and remove all the alts. Or maybe NCBI or UCSC provides a genome without any alts?

On Wed, Jun 19, 2019 at 5:39 PM Evgeni Frenkel < notifications@github.com

wrote:

Sounds great, thank you! I'll rerun my pipeline after the update.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 12:32 PM Maximilian Haeussler < notifications@github.com> wrote:

Yes, these alts are a huge distraction for everyone and break many pipelines. I'll remove them.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZTGKGHXHIYD2G5C3U3P3JNQ7A5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCN2DI#issuecomment-503635213

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZTU35L2AXV3H7MREI3P3JNQ7ANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TLFEC25PICGE2WSP3LP3JOLBA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCON5I#issuecomment-503637749

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AACL4TMOMWD6QRA7KUV54HLP3JOLBANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZRF5CUMM6ALYBM6UU3P3JOYHA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCOXLI#issuecomment-503638957

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZT2VATYTVF3WHTSJXLP3JOYHANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TNUMII3ORQD7Y3TJTTP3KCWLA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYDBQPI#issuecomment-503715901

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AACL4TJZLSEUK3OFRY3MBD3P3KCWLANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZT44WPLGO2FDXKML7DP3KTYXA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYDM2XI#issuecomment-503762269 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AABRLZWCAA3M3D4WLWTWFTDP3KTYXANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TPYSRIIL3AG4U4XYELP3KU2RA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYDNNSA#issuecomment-503764680, or mute the thread https://github.com/notifications/unsubscribe-auth/AACL4TLQA6DCSGMFRCAVRRDP3KU2RANCNFSM4HXW47MQ .

genya commented 5 years ago

just tried it, now for both python2 and 3 I get the same error message indicating need to install python-mysqldb. I contacted our IT department about satisfying this dependency and will let you know how it works once that's done.

Thanks for your help! Genya

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 6:52 PM Maximilian Haeussler < notifications@github.com> wrote:

ok can you do a git pull and retry? It should work with py3 now. At least I'm running it right now on python3.

On Wed, Jun 19, 2019 at 11:48 PM Maximilian Haeussler < maximilianh@gmail.com> wrote:

I think I got the script ocnverted to py3... one sec...

On Wed, Jun 19, 2019 at 11:07 PM Evgeni Frenkel < notifications@github.com> wrote:

I tried running in python2 virtual environment and got error indicating that our server does not have mysql. Our IT people should be able to fix that...

(crispor)$ ./crisprAddGenome fasta hg38_noAlt.fa --desc 'hg38|Homo sapiens|human|hg38.96' --gff Homo_sapiens.GRCh38.96.gff3 h38.96.gff3 Traceback (most recent call last): File "./crisprAddGenome", line 7, in import MySQLdb # install with "sudo apt-get install python-mysqldb" ImportError: No module named MySQLdb

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 5:58 PM Maximilian Haeussler < notifications@github.com> wrote:

Hm, yes, sorry, this is a python2.7 script. In of many of today's linuxes, python 3.6 is the default and because Guide van Rossum is a very stubborn guy, he changed the most essential statement, print, to be incompatible. I can fix this one quickly.

On Wed, Jun 19, 2019 at 8:32 PM Evgeni Frenkel < notifications@github.com> wrote:

I tried to do as you suggest with the following:

./crisprAddGenome fasta hg38_noAlt.fa --desc 'hg38|Homo sapiens|human|hg38.96' --gff Homo_sapiens.GRCh38.96.gff3

where hg38_noAlt.fa is the human genome fasta file with the non-main chromosomes removed, and got the following error:

File "./crisprAddGenome", line 77 print msg ^ SyntaxError: Missing parentheses in call to 'print'. Did you mean print(msg)?

I can't run as sudo, don't have root privileges, but gffread and twoBitToFa are installed. Do you know what I'm doing wrong? Please let me know.

Thank you! Genya

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 12:42 PM Maximilian Haeussler < notifications@github.com> wrote:

You don't even have to wait, you can simply use a fasta filter tool and remove all the alts. Or maybe NCBI or UCSC provides a genome without any alts?

On Wed, Jun 19, 2019 at 5:39 PM Evgeni Frenkel < notifications@github.com

wrote:

Sounds great, thank you! I'll rerun my pipeline after the update.

-- Evgeni (Genya) Frenkel, PhD Whitehead Institute for Biomedical Research Lab of David M Sabatini http://sabatinilab.wi.mit.edu/membersDS.html

On Wed, Jun 19, 2019 at 12:32 PM Maximilian Haeussler < notifications@github.com> wrote:

Yes, these alts are a huge distraction for everyone and break many pipelines. I'll remove them.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZTGKGHXHIYD2G5C3U3P3JNQ7A5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCN2DI#issuecomment-503635213

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZTU35L2AXV3H7MREI3P3JNQ7ANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TLFEC25PICGE2WSP3LP3JOLBA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCON5I#issuecomment-503637749

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AACL4TMOMWD6QRA7KUV54HLP3JOLBANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZRF5CUMM6ALYBM6UU3P3JOYHA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCOXLI#issuecomment-503638957

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZT2VATYTVF3WHTSJXLP3JOYHANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TNUMII3ORQD7Y3TJTTP3KCWLA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYDBQPI#issuecomment-503715901

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AACL4TJZLSEUK3OFRY3MBD3P3KCWLANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZT44WPLGO2FDXKML7DP3KTYXA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYDM2XI#issuecomment-503762269

, or mute the thread <

https://github.com/notifications/unsubscribe-auth/AABRLZWCAA3M3D4WLWTWFTDP3KTYXANCNFSM4HXW47MQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AACL4TPYSRIIL3AG4U4XYELP3KU2RA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYDNNSA#issuecomment-503764680 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AACL4TLQA6DCSGMFRCAVRRDP3KU2RANCNFSM4HXW47MQ

.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/crisporWebsite/issues/31?email_source=notifications&email_token=AABRLZXENRTF5MM6M2IFTNTP3K2DNA5CNFSM4HXW47M2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYDQCDY#issuecomment-503775503, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRLZSOKR27HKCFIGYPYDLP3K2DNANCNFSM4HXW47MQ .