lucapinello / CRISPResso

Software pipeline for the analysis of CRISPR-Cas9 genome editing outcomes from sequencing data
Other
131 stars 55 forks source link

Deprecation of convert_objects causing fatal error #3

Closed aaiezza closed 8 years ago

aaiezza commented 8 years ago

I'm running CRISPRessoPooled in mixed-mode with the following command:

CRISPRessoPooled \
    --fastq_r1 A5_S184_L001_R1_001.fastq.gz \
    --fastq_r2 A5_S184_L001_R2_001.fastq.gz \
    --amplicons_file amplicons_description.txt \
    --bowtie2_index /data/ref_genome/mouse/musculus \
    --gene_annotations /data/ref_genome_annot/ucsc/mouse/vMM10.annotation.gz \
    --n_processes 4 \
    --name A5_S184 \
    --output_folder cspresso/A5_S184 \
    --save_also_png

This leads to the following output:

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Checking dependencies...

INFO  @ Tue, 12 Jul 2016 19:03:09:

 All the required dependencies are present!

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Amplicon description file and bowtie2 reference genome index files provided. The analysis will be perfomed using the reads that are aligned ony to the amplicons provided and not to other genomic regions.

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Creating Folder /cvri/miano/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Done!

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Merging paired sequences with Flash...

INFO  @ Tue, 12 Jul 2016 19:03:10:
         Done!

INFO  @ Tue, 12 Jul 2016 19:03:10:
         Loading gene coordinates from annotation file: /cvri/data/ref_genome_annot/ucsc/mouse/vMM10.annotation.gz...

INFO  @ Tue, 12 Jul 2016 19:03:11:
         The uncompressed reference fasta file for /cvri/data/ref_genome/mouse/musculus is already present! Skipping generation.

INFO  @ Tue, 12 Jul 2016 19:03:11:
         Aligning reads to the provided genome index...

INFO  @ Tue, 12 Jul 2016 18:48:20:
         Demultiplexing reads by location...

gzip: /cvri/miano/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184/MAPPED_REGIONS//*.fastq: No such file or directory
INFO  @ Tue, 12 Jul 2016 18:48:20:
         Reporting problematic regions...

/usr/local/lib/python2.7/dist-packages/CRISPResso/CRISPRessoPooledCORE.py:770: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  df_regions=df_regions.convert_objects(convert_numeric=True)
CRITICAL @ Tue, 12 Jul 2016 18:48:20:

ERROR: Cannot set a frame with no defined index and a value that cannot be converted to a Series

~~~CRISPRessoPooled~~~
-Analysis of CRISPR/Cas9 outcomes from POOLED deep sequencing data-
              )                                            )
             (           _______________________          (
            __)__       | __  __  __     __ __  |        __)__
         C\|     \      ||__)/  \/  \|  |_ |  \ |     C\|     \
           \     /      ||   \__/\__/|__|__|__/ |       \     /
            \___/       |_______________________|        \___/

[Luca Pinello 2015, send bugs, suggestions or *green coffee* to lucapinello AT gmail DOT com]

Version 0.9.4

Mapping amplicons to the reference genome...

At this point the program stops executing. I found that if you alter CRISPRessoPooledCORE.py at 771 and 801 to df_regions=df_regions.apply(pd.to_numeric, errors='ignore') this problem goes away yielding these new results:

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Checking dependencies...

INFO  @ Tue, 12 Jul 2016 19:03:09:

 All the required dependencies are present!

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Amplicon description file and bowtie2 reference genome index files provided. The analysis will be perfomed using the reads that are aligned ony to the amplicons provided and not to other genomic regions.

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Creating Folder /cvri/miano/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Done!

INFO  @ Tue, 12 Jul 2016 19:03:09:
         Merging paired sequences with Flash...

INFO  @ Tue, 12 Jul 2016 19:03:10:
         Done!

INFO  @ Tue, 12 Jul 2016 19:03:10:
         Loading gene coordinates from annotation file: /cvri/data/ref_genome_annot/ucsc/mouse/vMM10.annotation.gz...

INFO  @ Tue, 12 Jul 2016 19:03:11:
         The uncompressed reference fasta file for /cvri/data/ref_genome/mouse/musculus is already present! Skipping generation.

INFO  @ Tue, 12 Jul 2016 19:03:11:
         Aligning reads to the provided genome index...

gzip: /cvri/miano/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184/MAPPED_REGIONS//*.fastq: No such file or directory
INFO  @ Tue, 12 Jul 2016 19:05:56:
         Reporting problematic regions...

CRITICAL @ Tue, 12 Jul 2016 19:05:56:

ERROR: Cannot set a frame with no defined index and a value that cannot be converted to a Series

~~~CRISPRessoPooled~~~
-Analysis of CRISPR/Cas9 outcomes from POOLED deep sequencing data-

              )                                            )
             (           _______________________          (
            __)__       | __  __  __     __ __  |        __)__
         C\|     \      ||__)/  \/  \|  |_ |  \ |     C\|     \
           \     /      ||   \__/\__/|__|__|__/ |       \     /
            \___/       |_______________________|        \___/

[Luca Pinello 2015, send bugs, suggestions or *green coffee* to lucapinello AT gmail DOT com]

Version 0.9.4

Mapping amplicons to the reference genome...

There is still an error, but it continues to run this time even though all that was fixed was a deprecation. Not sure really if that is a good thing or not...

lucapinello commented 8 years ago

Hi thanks for reporting this!

The error you are mentioning

"ERROR: Cannot set a frame with no defined index and a value that cannot be converted to a Series " is already fixed in latest version 0.9.7.

The warning about convert_objects has no effect on the output.

Could you please try to upgrade to the latest version 0.9.7 and let me know if you still get the fatal error?

Thanks!

aaiezza commented 8 years ago

Oh dear. Forgive me for posting without noticing that first.

Newish command:

CRISPRessoPooled \
    --fastq_r1 A5_S184_L001_R1_001.fastq.gz \
    --fastq_r2 A5_S184_L001_R2_001.fastq.gz \
    --amplicons_file amplicons_description.txt \
    --bowtie2_index /data/ref_genome/mouse/musculus \
    --gene_annotations /data/ref_genome_annot/ucsc/mouse/vMM10.annotation.gz \
    --n_processes 4 \
    --name A5_S184 \
    --trim_sequences \
    --max_paired_end_reads_overlap 250 \
    --exclude_bp_from_left 10 \
    --exclude_bp_from_right 10 \
    --output_folder cspresso/A5_S184 \
    --save_also_png

OUTPUT:

Sorry it's so long. They may be other problems here at play.

INFO  @ Wed, 13 Jul 2016 15:42:58:
         Checking dependencies...

INFO  @ Wed, 13 Jul 2016 15:42:58:

 All the required dependencies are present!

INFO  @ Wed, 13 Jul 2016 15:42:58:
         Amplicon description file and bowtie2 reference genome index files provided. The analysis will be perfomed using the reads that are aligned ony to the amplicons provided and not to other genomic regions.

INFO  @ Wed, 13 Jul 2016 15:42:58:
         Creating Folder /gpfs/fs2/scratch/aaiezza/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184

WARNING @ Wed, 13 Jul 2016 15:42:58:
         Folder /gpfs/fs2/scratch/aaiezza/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184 already exists.

INFO  @ Wed, 13 Jul 2016 15:42:58:
         Trimming sequences with Trimmomatic...

INFO  @ Wed, 13 Jul 2016 15:43:00:
         Done!

INFO  @ Wed, 13 Jul 2016 15:43:00:
         Merging paired sequences with Flash...

INFO  @ Wed, 13 Jul 2016 15:43:01:
         Done!

INFO  @ Wed, 13 Jul 2016 15:43:01:
         Loading gene coordinates from annotation file: /gpfs/fs2/scratch/aaiezza/data/ref_genome_annot/ucsc/mouse/vMM10.annotation.gz...

1 reads; of these:
  1 (100.00%) were unpaired; of these:
    1 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate
INFO  @ Wed, 13 Jul 2016 15:43:03:
         The amplicon [Fmn1] is not mappable to the reference genome provided!

1 reads; of these:
  1 (100.00%) were unpaired; of these:
    1 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate
INFO  @ Wed, 13 Jul 2016 15:43:04:
         The amplicon [Dntt] is not mappable to the reference genome provided!

1 reads; of these:
  1 (100.00%) were unpaired; of these:
    1 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate
INFO  @ Wed, 13 Jul 2016 15:43:05:
         The amplicon [Ankrd10] is not mappable to the reference genome provided!

1 reads; of these:
  1 (100.00%) were unpaired; of these:
    1 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate
INFO  @ Wed, 13 Jul 2016 15:43:07:
         The amplicon [Mt1] is not mappable to the reference genome provided!

1 reads; of these:
  1 (100.00%) were unpaired; of these:
    1 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate
INFO  @ Wed, 13 Jul 2016 15:43:08:
         The amplicon [Psmd13] is not mappable to the reference genome provided!

1 reads; of these:
  1 (100.00%) were unpaired; of these:
    1 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate
INFO  @ Wed, 13 Jul 2016 15:43:10:
         The amplicon [Asap1] is not mappable to the reference genome provided!

1 reads; of these:
  1 (100.00%) were unpaired; of these:
    1 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate
INFO  @ Wed, 13 Jul 2016 15:43:11:
         The amplicon [chr10_1] is not mappable to the reference genome provided!

1 reads; of these:
  1 (100.00%) were unpaired; of these:
    1 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate
INFO  @ Wed, 13 Jul 2016 15:43:12:
         The amplicon [chr14] is not mappable to the reference genome provided!

1 reads; of these:
  1 (100.00%) were unpaired; of these:
    1 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate
INFO  @ Wed, 13 Jul 2016 15:43:14:
         The amplicon [chr13] is not mappable to the reference genome provided!

1 reads; of these:
  1 (100.00%) were unpaired; of these:
    1 (100.00%) aligned 0 times
    0 (0.00%) aligned exactly 1 time
    0 (0.00%) aligned >1 times
0.00% overall alignment rate
INFO  @ Wed, 13 Jul 2016 15:43:15:
         The amplicon [chr10] is not mappable to the reference genome provided!

WARNING @ Wed, 13 Jul 2016 15:43:15:
         The amplicon sequence Fmn1 provided:
GTCTTTGAGTTTGGGCAGAATTTCTAAACTATATCCGTCTGCTTGCCCTCGCGTCCTGTTTCCTCCATTCATGTAGTTTCCAAAAGCCAGAATGAGAGCTAAGATATCCTTCACACTCTTCATGTGCAACAAGCCCTGCAGTGGCAAACACCAGCAGTTATGGTCTGAACTACAAACACAGTCATATTCGCTCGCTCCACAGCTAACCCTCATTAAAGTAACCAAATCCTGGTAAGTGGCTTGTGATTAGTTGTATCAACAGTTGGCAAATACAAGATACATTTCACTACAGCAGTATCATGTGGG

is different from the reference sequence(both strand):

GTCTTTGAGTTTGGGCAGAATTTCTAAACTATATCCGTCTGCTTGCCCTCGCGTCCTGTTTCCTCCATTCATGTAGTTTCCAAAAGCCAGAATGAGAGCTAAGATATCCTTCACACTCTTCATGTGCAACAAGCCCTGCAGTGGCAAACACCAGCAGTTATGGTCTGAACTACAAACACAGTCATATTCGCTCGCTCCACAGCTAACCCTCATTAAAGTAACCAAATCCTGGTAAGTGGCTTGTGATTAGTTGTATCAACAGTTGGCAAATACAAGATACATTTCACTACAGCAGTATCATGTGGG

CCCACATGATACTGCTGTAGTGAAATGTATCTTGTATTTGCCAACTGTTGATACAACTAATCACAAGCCACTTACCAGGATTTGGTTACTTTAATGAGGGTTAGCTGTGGAGCGAGCGAATATGACTGTGTTTGTAGTTCAGACCATAACTGCTGGTGTTTGCCACTGCAGGGCTTGTTGCACATGAAGAGTGTGAAGGATATCTTAGCTCTCATTCTGGCTTTTGGAAACTACATGAATGGAGGAAACAGGACGCGAGGGCAAGCAGACGGATATAGTTTAGAAATTCTGCCCAAACTCAAAGAC

WARNING @ Wed, 13 Jul 2016 15:43:15:
         The amplicon sequence Dntt provided:
TAGAAACAACGCTCCTACTGTCCATTTATCCACTCAACAGATATTCACACATCACCTGCCGCCTGTTGGGCAGTGATTGACAAGGGTCCAAGCCAATCAGCTTCTTGTTCACATTGAGTTTCTGTTCTAGTACAGGGAGCCCAGAGCTCAGCCACCCGGGAGCTTTGCCCTGAGGAAGAAAGTCACCCCAAAAATTTATGTTAGAGACAGCAGTTTCAAACACCCAAGGGCTTTGGATAGCTCTAAACACCGTGTACCCGAAATAATCTGGACTAGACGGTAATTTGTTTTAATTCTCTTTGTAGCAGTTTGAGAGAGACTTGCGG

is different from the reference sequence(both strand):

TAGAAACAACGCTCCTACTGTCCATTTATCCACTCAACAGATATTCACACATCACCTGCCGCCTGTTGGGCAGTGATTGACAAGGGTCCAAGCCAATCAGCTTCTTGTTCACATTGAGTTTCTGTTCTAGTACAGGGAGCCCAGAGCTCAGCCACCCGGGAGCTTTGCCCTGAGGAAGAAAGTCACCCCAAAAATTTATGTTAGAGACAGCAGTTTCAAACACCCAAGGGCTTTGGATAGCTCTAAACACCGTGTACCCGAAATAATCTGGACTAGACGGTAATTTGTTTTAATTCTCTTTGTAGCAGTTTGAGAGAGACTTGCGG

CCGCAAGTCTCTCTCAAACTGCTACAAAGAGAATTAAAACAAATTACCGTCTAGTCCAGATTATTTCGGGTACACGGTGTTTAGAGCTATCCAAAGCCCTTGGGTGTTTGAAACTGCTGTCTCTAACATAAATTTTTGGGGTGACTTTCTTCCTCAGGGCAAAGCTCCCGGGTGGCTGAGCTCTGGGCTCCCTGTACTAGAACAGAAACTCAATGTGAACAAGAAGCTGATTGGCTTGGACCCTTGTCAATCACTGCCCAACAGGCGGCAGGTGATGTGTGAATATCTGTTGAGTGGATAAATGGACAGTAGGAGCGTTGTTTCTA

WARNING @ Wed, 13 Jul 2016 15:43:15:
         The amplicon sequence Ankrd10 provided:
TGTGCACAGCGTGCTATTTCACACTAGGAATTGGCAAGAATCTCTAGGGAGTGCCAACACGTTTCCTGAGTCAGACAGTACTGGAAAACACCAGAAGGCCCACGTGGCCTTCTGACATCCAGGACGACCTGCCTGCTGGCATGAGAAGAGCGAAGAGCTTCTTTCTCCTGCCTGACCAGGAAGGGAACAATGCTGTCTCCATAAAGGAGAGGCTCTGGCT

is different from the reference sequence(both strand):

TGTGCACAGCGTGCTATTTCACACTAGGAATTGGCAAGAATCTCTAGGGAGTGCCAACACGTTTCCTGAGTCAGACAGTACTGGAAAACACCAGAAGGCCCACGTGGCCTTCTGACATCCAGGACGACCTGCCTGCTGGCATGAGAAGAGCGAAGAGCTTCTTTCTCCTGCCTGACCAGGAAGGGAACAATGCTGTCTCCATAAAGGAGAGGCTCTGGCT

AGCCAGAGCCTCTCCTTTATGGAGACAGCATTGTTCCCTTCCTGGTCAGGCAGGAGAAAGAAGCTCTTCGCTCTTCTCATGCCAGCAGGCAGGTCGTCCTGGATGTCAGAAGGCCACGTGGGCCTTCTGGTGTTTTCCAGTACTGTCTGACTCAGGAAACGTGTTGGCACTCCCTAGAGATTCTTGCCAATTCCTAGTGTGAAATAGCACGCTGTGCACA

WARNING @ Wed, 13 Jul 2016 15:43:15:
         The amplicon sequence Mt1 provided:
GGACATTTCTCAGAGCCAGTTTTGTAGGAGTTCCCCGCCCCTAGCCTTAGCCGCCACCCAAGGTGTCCCAACTCACTCTTCTTGCAGGAGGTGCACTTGCAGTTCTTGCAGGCGCAGGAGCTGGTGCAAGTGCAGGAGCCGCCTGGGGAGGAGAAAGAAGACAGCATGAGGGAGGCAGCATTACAGCAGTGGCCAACACCACGAGTCCCGGCTCAGTTCACTAAGTCCTCCTCGGAGCTGCAGGGAGCCTAGCCCCACTTTTCTCCTCACAGGTTAAGTCAGGGATTATGTCTTTGAGTCCCAAGACATAAAGGTCCTTCACCTCTTTCT

is different from the reference sequence(both strand):

GGACATTTCTCAGAGCCAGTTTTGTAGGAGTTCCCCGCCCCTAGCCTTAGCCGCCACCCAAGGTGTCCCAACTCACTCTTCTTGCAGGAGGTGCACTTGCAGTTCTTGCAGGCGCAGGAGCTGGTGCAAGTGCAGGAGCCGCCTGGGGAGGAGAAAGAAGACAGCATGAGGGAGGCAGCATTACAGCAGTGGCCAACACCACGAGTCCCGGCTCAGTTCACTAAGTCCTCCTCGGAGCTGCAGGGAGCCTAGCCCCACTTTTCTCCTCACAGGTTAAGTCAGGGATTATGTCTTTGAGTCCCAAGACATAAAGGTCCTTCACCTCTTTCT

AGAAAGAGGTGAAGGACCTTTATGTCTTGGGACTCAAAGACATAATCCCTGACTTAACCTGTGAGGAGAAAAGTGGGGCTAGGCTCCCTGCAGCTCCGAGGAGGACTTAGTGAACTGAGCCGGGACTCGTGGTGTTGGCCACTGCTGTAATGCTGCCTCCCTCATGCTGTCTTCTTTCTCCTCCCCAGGCGGCTCCTGCACTTGCACCAGCTCCTGCGCCTGCAAGAACTGCAAGTGCACCTCCTGCAAGAAGAGTGAGTTGGGACACCTTGGGTGGCGGCTAAGGCTAGGGGCGGGGAACTCCTACAAAACTGGCTCTGAGAAATGTCC

WARNING @ Wed, 13 Jul 2016 15:43:15:
         The amplicon sequence Psmd13 provided:
CATGATAGGGCATGGGATCAAGTCACAAAACCAGGAACACGTCTGCTGGAGCAGCAATTTCAGGATTAGGAGGCATCAGCAGCGCAGCCAGCCTGGAAGTGCAGGGCGCAGACCTCAGAGGGCTGCTTTGCTAGGCCTCCACAGCAGTAATCCCACCTTGTGTGCGACAGCGTCCGCTTCCTAAATGGCTTCTGTCCACATAAAACTAGGACAGCATTGGTAAACCCGCAAAGGGCAGGGGCCCAGCCCCCTTACCTGCTGCAAATCCAGCACTCGCGGCTGCACCCACGTCATGTGAACCCGCTTGTCCACCTCGTCTATGCTGCCTCTCACCAGCCCCACCGAGAGTGCCTTCATCACCAGCAACTCCA

is different from the reference sequence(both strand):

CATGATAGGGCATGGGATCAAGTCACAAAACCAGGAACACGTCTGCTGGAGCAGCAATTTCAGGATTAGGAGGCATCAGCAGCGCAGCCAGCCTGGAAGTGCAGGGCGCAGACCTCAGAGGGCTGCTTTGCTAGGCCTCCACAGCAGTAATCCCACCTTGTGTGCGACAGCGTCCGCTTCCTAAATGGCTTCTGTCCACATAAAACTAGGACAGCATTGGTAAACCCGCAAAGGGCAGGGGCCCAGCCCCCTTACCTGCTGCAAATCCAGCACTCGCGGCTGCACCCACGTCATGTGAACCCGCTTGTCCACCTCGTCTATGCTGCCTCTCACCAGCCCCACCGAGAGTGCCTTCATCACCAGCAACTCCA

TGGAGTTGCTGGTGATGAAGGCACTCTCGGTGGGGCTGGTGAGAGGCAGCATAGACGAGGTGGACAAGCGGGTTCACATGACGTGGGTGCAGCCGCGAGTGCTGGATTTGCAGCAGGTAAGGGGGCTGGGCCCCTGCCCTTTGCGGGTTTACCAATGCTGTCCTAGTTTTATGTGGACAGAAGCCATTTAGGAAGCGGACGCTGTCGCACACAAGGTGGGATTACTGCTGTGGAGGCCTAGCAAAGCAGCCCTCTGAGGTCTGCGCCCTGCACTTCCAGGCTGGCTGCGCTGCTGATGCCTCCTAATCCTGAAATTGCTGCTCCAGCAGACGTGTTCCTGGTTTTGTGACTTGATCCCATGCCCTATCATG

WARNING @ Wed, 13 Jul 2016 15:43:15:
         The amplicon sequence Asap1 provided:
GTTTGGGTGGCATGAGTTTATCAAATAAGAGGGTAAGGCGTGTCAAAAATGACCACCACACAGAGCCCTCCCAGCTCCAGCCGGTTTGCTCCTGGCTCTGCAGATGAACGAGTCAAGATTATTCCAAGCTCAGCAGTGGTAAACAGCCAGGGATTTCTTTCTGAATTCACCGGAGAGCCCAAACTGCCGCCACCAATTGCTGTTTCAGTTTCCTCCGAAGATTATGTTATGTATCTGCCCCTCCCCTGCCTCCTCCAGCCAAGAGGGGACTATATGAACAATGAGATATTGTGCTCTGGTAAGCA

is different from the reference sequence(both strand):

GTTTGGGTGGCATGAGTTTATCAAATAAGAGGGTAAGGCGTGTCAAAAATGACCACCACACAGAGCCCTCCCAGCTCCAGCCGGTTTGCTCCTGGCTCTGCAGATGAACGAGTCAAGATTATTCCAAGCTCAGCAGTGGTAAACAGCCAGGGATTTCTTTCTGAATTCACCGGAGAGCCCAAACTGCCGCCACCAATTGCTGTTTCAGTTTCCTCCGAAGATTATGTTATGTATCTGCCCCTCCCCTGCCTCCTCCAGCCAAGAGGGGACTATATGAACAATGAGATATTGTGCTCTGGTAAGCA

TGCTTACCAGAGCACAATATCTCATTGTTCATATAGTCCCCTCTTGGCTGGAGGAGGCAGGGGAGGGGCAGATACATAACATAATCTTCGGAGGAAACTGAAACAGCAATTGGTGGCGGCAGTTTGGGCTCTCCGGTGAATTCAGAAAGAAATCCCTGGCTGTTTACCACTGCTGAGCTTGGAATAATCTTGACTCGTTCATCTGCAGAGCCAGGAGCAAACCGGCTGGAGCTGGGAGGGCTCTGTGTGGTGGTCATTTTTGACACGCCTTACCCTCTTATTTGATAAACTCATGCCACCCAAAC

WARNING @ Wed, 13 Jul 2016 15:43:15:
         The amplicon sequence chr10_1 provided:
GATGGAGGATGGGAAGAACAATAATTAGAGGGCCACGGTCACGGGATGCGCACAGGCAGAGCTCCTCAGCGCCTCTCAGATGTGAGGCCGAAGCCTAATTATGAAAAGCTGCTGGGTCGGAAGACAGAGGCTGCTGTCTTGGGACATCAGATGCATAAGTGAGATTACTTTTCAGGATAGTGATAAACAACAGGCGTAAACACCCGAGGGAGGGATGGAAAACAGACTCGTGGTCTCTGATGAGGAGATCAGTACCCAGGTTTCGCTCTCCTTAGGGTGACTTCATCAGTGG

is different from the reference sequence(both strand):

GATGGAGGATGGGAAGAACAATAATTAGAGGGCCACGGTCACGGGATGCGCACAGGCAGAGCTCCTCAGCGCCTCTCAGATGTGAGGCCGAAGCCTAATTATGAAAAGCTGCTGGGTCGGAAGACAGAGGCTGCTGTCTTGGGACATCAGATGCATAAGTGAGATTACTTTTCAGGATAGTGATAAACAACAGGCGTAAACACCCGAGGGAGGGATGGAAAACAGACTCGTGGTCTCTGATGAGGAGATCAGTACCCAGGTTTCGCTCTCCTTAGGGTGACTTCATCAGTGG

CCACTGATGAAGTCACCCTAAGGAGAGCGAAACCTGGGTACTGATCTCCTCATCAGAGACCACGAGTCTGTTTTCCATCCCTCCCTCGGGTGTTTACGCCTGTTGTTTATCACTATCCTGAAAAGTAATCTCACTTATGCATCTGATGTCCCAAGACAGCAGCCTCTGTCTTCCGACCCAGCAGCTTTTCATAATTAGGCTTCGGCCTCACATCTGAGAGGCGCTGAGGAGCTCTGCCTGTGCGCATCCCGTGACCGTGGCCCTCTAATTATTGTTCTTCCCATCCTCCATC

WARNING @ Wed, 13 Jul 2016 15:43:15:
         The amplicon sequence chr14 provided:
CCCTGAGATCAACACTGTCTTCCCACACAAAATGCTCACGCTGCCATTTAATGTCAGGTAAACAGACTTGTACTTAGTAAAAGCTTCGTGGAATTGTTCATCTCTACAGAGGGCAGCCACCAGCAACCTACTGGATCAGGAACCCACGCACCATCAAAGAGGAAAAGCATCCGTGGTAAACACCCGAGGTGATGAACCTGCTCCCAAAGAGCAAAGACAAAAACTAACTCAACCTGCCGCACAGACACACATGCTCGTTCTTTTTTTTTCTTTTTTGGTTTTTCCAGACAGGGTTTCTCTGTATAGCCCTGGCTGTCCTGGAACTCACTTTGTAGACCAGG

is different from the reference sequence(both strand):

CCCTGAGATCAACACTGTCTTCCCACACAAAATGCTCACGCTGCCATTTAATGTCAGGTAAACAGACTTGTACTTAGTAAAAGCTTCGTGGAATTGTTCATCTCTACAGAGGGCAGCCACCAGCAACCTACTGGATCAGGAACCCACGCACCATCAAAGAGGAAAAGCATCCGTGGTAAACACCCGAGGTGATGAACCTGCTCCCAAAGAGCAAAGACAAAAACTAACTCAACCTGCCGCACAGACACACATGCTCGTTCTTTTTTTTTCTTTTTTGGTTTTTCCAGACAGGGTTTCTCTGTATAGCCCTGGCTGTCCTGGAACTCACTTTGTAGACCAGG

CCTGGTCTACAAAGTGAGTTCCAGGACAGCCAGGGCTATACAGAGAAACCCTGTCTGGAAAAACCAAAAAAGAAAAAAAAAGAACGAGCATGTGTGTCTGTGCGGCAGGTTGAGTTAGTTTTTGTCTTTGCTCTTTGGGAGCAGGTTCATCACCTCGGGTGTTTACCACGGATGCTTTTCCTCTTTGATGGTGCGTGGGTTCCTGATCCAGTAGGTTGCTGGTGGCTGCCCTCTGTAGAGATGAACAATTCCACGAAGCTTTTACTAAGTACAAGTCTGTTTACCTGACATTAAATGGCAGCGTGAGCATTTTGTGTGGGAAGACAGTGTTGATCTCAGGG

WARNING @ Wed, 13 Jul 2016 15:43:15:
         The amplicon sequence chr13 provided:
CTTCTGCAGAGTCAGCTTCTTTGTCATTATATAAGAGTACAGGCACTCCCCCTCAATTTATAATGGAGTCACACCCGAGATAATCCCACAGAAGTGGAAAACACCCTAAGTTGAAGATGTGTTTTGCCTTTCTCAAACATGCTTGGAGCCTTAGCATGCAGTTTAACAAACCCTCTTAACACAAAGCCAGTTTATAATGAGAGCTCAAATATCTTCTGTAAAGTACAGTGCTGAAGTGAGC

is different from the reference sequence(both strand):

CTTCTGCAGAGTCAGCTTCTTTGTCATTATATAAGAGTACAGGCACTCCCCCTCAATTTATAATGGAGTCACACCCGAGATAATCCCACAGAAGTGGAAAACACCCTAAGTTGAAGATGTGTTTTGCCTTTCTCAAACATGCTTGGAGCCTTAGCATGCAGTTTAACAAACCCTCTTAACACAAAGCCAGTTTATAATGAGAGCTCAAATATCTTCTGTAAAGTACAGTGCTGAAGTGAGC

GCTCACTTCAGCACTGTACTTTACAGAAGATATTTGAGCTCTCATTATAAACTGGCTTTGTGTTAAGAGGGTTTGTTAAACTGCATGCTAAGGCTCCAAGCATGTTTGAGAAAGGCAAAACACATCTTCAACTTAGGGTGTTTTCCACTTCTGTGGGATTATCTCGGGTGTGACTCCATTATAAATTGAGGGGGAGTGCCTGTACTCTTATATAATGACAAAGAAGCTGACTCTGCAGAAG

WARNING @ Wed, 13 Jul 2016 15:43:15:
         The amplicon sequence chr10 provided:
CATCTAGCTGGTTCCTCCTTTCATTACTTCAATTCATCCACTTTGTGGTGCCACAAAGGGATTTAAAATGTCACAAAGACCGAGGCCACCAATTCCTTACCCTGTGGAGAGATAGACACTGTAGTCACTCAGGACACATTGGTCTCTTAAAGCAGGTCCTGCACAGTCAGGATGCCACAGCAATGCTAAACACCTGCAGCTGGAGTGTTTCTTGCTCGTTACAGTTCTTGACTGCACTGGATAATGTAAAGGTTGGATAATGAGTTGATCTCCGAACTGTTCTGTGGACCAATGAAACTGTAGCAAGCAG

is different from the reference sequence(both strand):

CATCTAGCTGGTTCCTCCTTTCATTACTTCAATTCATCCACTTTGTGGTGCCACAAAGGGATTTAAAATGTCACAAAGACCGAGGCCACCAATTCCTTACCCTGTGGAGAGATAGACACTGTAGTCACTCAGGACACATTGGTCTCTTAAAGCAGGTCCTGCACAGTCAGGATGCCACAGCAATGCTAAACACCTGCAGCTGGAGTGTTTCTTGCTCGTTACAGTTCTTGACTGCACTGGATAATGTAAAGGTTGGATAATGAGTTGATCTCCGAACTGTTCTGTGGACCAATGAAACTGTAGCAAGCAG

CTGCTTGCTACAGTTTCATTGGTCCACAGAACAGTTCGGAGATCAACTCATTATCCAACCTTTACATTATCCAGTGCAGTCAAGAACTGTAACGAGCAAGAAACACTCCAGCTGCAGGTGTTTAGCATTGCTGTGGCATCCTGACTGTGCAGGACCTGCTTTAAGAGACCAATGTGTCCTGAGTGACTACAGTGTCTATCTCTCCACAGGGTAAGGAATTGGTGGCCTCGGTCTTTGTGACATTTTAAATCCCTTTGTGGCACCACAAAGTGGATGAATTGAAGTAATGAAAGGAGGAACCAGCTAGATG

INFO  @ Wed, 13 Jul 2016 15:43:15:
         The uncompressed reference fasta file for /gpfs/fs2/scratch/aaiezza/data/ref_genome/mouse/musculus is already present! Skipping generation.

INFO  @ Wed, 13 Jul 2016 15:43:15:
         Aligning reads to the provided genome index...

INFO  @ Wed, 13 Jul 2016 15:43:18:
         Demultiplexing reads by location...

gzip: /gpfs/fs2/scratch/aaiezza/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184/MAPPED_REGIONS//*.fastq: No such file or directory
INFO  @ Wed, 13 Jul 2016 15:43:18:
         Processing amplicon:Fmn1

WARNING @ Wed, 13 Jul 2016 15:43:18:
         The amplicon Fmn1 doesn't have any read mapped to it!
 Please check your amplicon sequence.

INFO  @ Wed, 13 Jul 2016 15:43:18:
         Processing amplicon:Dntt

WARNING @ Wed, 13 Jul 2016 15:43:18:
         The amplicon Dntt doesn't have any read mapped to it!
 Please check your amplicon sequence.

INFO  @ Wed, 13 Jul 2016 15:43:18:
         Processing amplicon:Ankrd10

WARNING @ Wed, 13 Jul 2016 15:43:18:
         The amplicon Ankrd10 doesn't have any read mapped to it!
 Please check your amplicon sequence.

INFO  @ Wed, 13 Jul 2016 15:43:18:
         Processing amplicon:Mt1

WARNING @ Wed, 13 Jul 2016 15:43:18:
         The amplicon Mt1 doesn't have any read mapped to it!
 Please check your amplicon sequence.

INFO  @ Wed, 13 Jul 2016 15:43:18:
         Processing amplicon:Psmd13

WARNING @ Wed, 13 Jul 2016 15:43:18:
         The amplicon Psmd13 doesn't have any read mapped to it!
 Please check your amplicon sequence.

INFO  @ Wed, 13 Jul 2016 15:43:18:
         Processing amplicon:Asap1

WARNING @ Wed, 13 Jul 2016 15:43:18:
         The amplicon Asap1 doesn't have any read mapped to it!
 Please check your amplicon sequence.

INFO  @ Wed, 13 Jul 2016 15:43:18:
         Processing amplicon:chr10_1

WARNING @ Wed, 13 Jul 2016 15:43:18:
         The amplicon chr10_1 doesn't have any read mapped to it!
 Please check your amplicon sequence.

INFO  @ Wed, 13 Jul 2016 15:43:18:
         Processing amplicon:chr14

WARNING @ Wed, 13 Jul 2016 15:43:18:
         The amplicon chr14 doesn't have any read mapped to it!
 Please check your amplicon sequence.

INFO  @ Wed, 13 Jul 2016 15:43:18:
         Processing amplicon:chr13

WARNING @ Wed, 13 Jul 2016 15:43:18:
         The amplicon chr13 doesn't have any read mapped to it!
 Please check your amplicon sequence.

INFO  @ Wed, 13 Jul 2016 15:43:18:
         Processing amplicon:chr10

WARNING @ Wed, 13 Jul 2016 15:43:18:
         The amplicon chr10 doesn't have any read mapped to it!
 Please check your amplicon sequence.

INFO  @ Wed, 13 Jul 2016 15:43:18:
         Reporting problematic regions...

/software/crispresso/b1/lib/python2.7/site-packages/CRISPResso/CRISPRessoPooledCORE.py:771: FutureWarning: convert_objects is deprecated.  Use the data-type specific converters pd.to_datetime, pd.to_timedelta and pd.to_numeric.
  df_regions=df_regions.convert_objects(convert_numeric=True)
CRITICAL @ Wed, 13 Jul 2016 15:43:18:

ERROR: Cannot set a frame with no defined index and a value that cannot be converted to a Series

~~~CRISPRessoPooled~~~
-Analysis of CRISPR/Cas9 outcomes from POOLED deep sequencing data-

              )                                            )
             (           _______________________          (
            __)__       | __  __  __     __ __  |        __)__
         C\|     \      ||__)/  \/  \|  |_ |  \ |     C\|     \
           \     /      ||   \__/\__/|__|__|__/ |       \     /
            \___/       |_______________________|        \___/

[Luca Pinello 2015, send bugs, suggestions or *green coffee* to lucapinello AT gmail DOT com]

Version 0.9.7

Mapping amplicons to the reference genome...
srun: error: bhc0045: task 0: Exited with exit code 255
srun: Terminating job step 8023737.1
lucapinello commented 8 years ago

Sorry for the slow response but I am getting married in few days :)

Thanks for the detailed log. It seems that there is a problem with the mapping of the amplicon to the reference genome. Under the hood I use bowtie2 to perform this operation.

Could you please check/try two things:

1) That bowtie2 is installed properly and that the index provided is correct 2) Try to run CRISPResso in "genome only mode" and in "amplicon only mode" to see if the problem are the calls to bowtie2 in general or just the mapping of the amplicons. To do that you can use these two command:

GENOME ONLY MODE:

CRISPRessoPooled \ --fastq_r1 A5_S184_L001_R1_001.fastq.gz \ --fastq_r2 A5_S184_L001_R2_001.fastq.gz \ --bowtie2_index /data/ref_genome/mouse/musculus \ --gene_annotations /data/ref_genome_annot/ucsc/mouse/vMM10.annotation.gz \ --n_processes 4 \ --name A5_S184 \ --trim_sequences \ --max_paired_end_reads_overlap 250 \ --exclude_bp_from_left 10 \ --exclude_bp_from_right 10 \ --output_folder cspresso/A5_S184 \ --save_also_png

AMPLICON ONLY MODE

CRISPRessoPooled \ --fastq_r1 A5_S184_L001_R1_001.fastq.gz \ --fastq_r2 A5_S184_L001_R2_001.fastq.gz \ --amplicons_file amplicons_description.txt \ --gene_annotations /data/ref_genome_annot/ucsc/mouse/vMM10.annotation.gz \ --n_processes 4 \ --name A5_S184 \ --trim_sequences \ --max_paired_end_reads_overlap 250 \ --exclude_bp_from_left 10 \ --exclude_bp_from_right 10 \ --output_folder cspresso/A5_S184 \ --save_also_png

Thanks and I hope we can solve this quickly!

aaiezza commented 8 years ago

Hey congratulations!

I have tried running in with just amplicons and it seems to work great, however without the annotation file given. (Though I'm not sure how to get the nice graphs you get when using the web UI)

I've tried running also just against a bowtie 2 index genome and I still get the same issue of the amplicons not mapping to the reference. My guess is maybe my amplicon description file might be messed up:

"Fmn1_(chr2:-113435153,_intronic),_score_=0.3_4MM_[1:3:11:19],_306bp"   GTCTTTGAGTTTGGGCAGAATTTCTAAACTATATCCGTCTGCTTGCCCTCGCGTCCTGTTTCCTCCATTCATGTAGTTTCCAAAAGCCAGAATGAGAGCTAAGATATCCTTCACACTCTTCATGTGCAACAAGCCCTGCAGTGGCAAACACCAGCAGTTATGGTCTGAACTACAAACACAGTCATATTCGCTCGCTCCACAGCTAACCCTCATTAAAGTAACCAAATCCTGGTAAGTGGCTTGTGATTAGTTGTATCAACAGTTGGCAAATACAAGATACATTTCACTACAGCAGTATCATGTGGG  ACAAGCCCTGCAGTGGCAAACACCAGCAGTTA    NA  NA
"Dntt_(chr19:+41130149,_intronic),_score=0.2_4MM_[9:10:11:20],_326bp"   TAGAAACAACGCTCCTACTGTCCATTTATCCACTCAACAGATATTCACACATCACCTGCCGCCTGTTGGGCAGTGATTGACAAGGGTCCAAGCCAATCAGCTTCTTGTTCACATTGAGTTTCTGTTCTAGTACAGGGAGCCCAGAGCTCAGCCACCCGGGAGCTTTGCCCTGAGGAAGAAAGTCACCCCAAAAATTTATGTTAGAGACAGCAGTTTCAAACACCCAAGGGCTTTGGATAGCTCTAAACACCGTGTACCCGAAATAATCTGGACTAGACGGTAATTTGTTTTAATTCTCTTTGTAGCAGTTTGAGAGAGACTTGCGG  ACAGCAGTTTCAAACACCCA    NA  NA
"Ankrd10_(chr8:+11614650,_intronic),_score=0.2_4MM_[5:7:11:19],_220bp"  TGTGCACAGCGTGCTATTTCACACTAGGAATTGGCAAGAATCTCTAGGGAGTGCCAACACGTTTCCTGAGTCAGACAGTACTGGAAAACACCAGAAGGCCCACGTGGCCTTCTGACATCCAGGACGACCTGCCTGCTGGCATGAGAAGAGCGAAGAGCTTCTTTCTCCTGCCTGACCAGGAAGGGAACAATGCTGTCTCCATAAAGGAGAGGCTCTGGCT    ACAGTACTGGAAAACACCAGA   NA  NA
"Mt1_(chr8:-96703655,_intronic),_score=0.1_4MM_[11:12:19:20],_330bp"    GGACATTTCTCAGAGCCAGTTTTGTAGGAGTTCCCCGCCCCTAGCCTTAGCCGCCACCCAAGGTGTCCCAACTCACTCTTCTTGCAGGAGGTGCACTTGCAGTTCTTGCAGGCGCAGGAGCTGGTGCAAGTGCAGGAGCCGCCTGGGGAGGAGAAAGAAGACAGCATGAGGGAGGCAGCATTACAGCAGTGGCCAACACCACGAGTCCCGGCTCAGTTCACTAAGTCCTCCTCGGAGCTGCAGGGAGCCTAGCCCCACTTTTCTCCTCACAGGTTAAGTCAGGGATTATGTCTTTGAGTCCCAAGACATAAAGGTCCTTCACCTCTTTCT  ACAGCAGTGGCCAACACCACGAGTCC  NA  NA
"Psmd13_(chr7:-148083641,_intronic),_score=0.0_4MM_[7:16:18:20],_371bp" CATGATAGGGCATGGGATCAAGTCACAAAACCAGGAACACGTCTGCTGGAGCAGCAATTTCAGGATTAGGAGGCATCAGCAGCGCAGCCAGCCTGGAAGTGCAGGGCGCAGACCTCAGAGGGCTGCTTTGCTAGGCCTCCACAGCAGTAATCCCACCTTGTGTGCGACAGCGTCCGCTTCCTAAATGGCTTCTGTCCACATAAAACTAGGACAGCATTGGTAAACCCGCAAAGGGCAGGGGCCCAGCCCCCTTACCTGCTGCAAATCCAGCACTCGCGGCTGCACCCACGTCATGTGAACCCGCTTGTCCACCTCGTCTATGCTGCCTCTCACCAGCCCCACCGAGAGTGCCTTCATCACCAGCAACTCCA ACAGCATTGGTAAACCCGCAA   NA  NA
"Asap1_chr15:+64146590  Score=0.6_3MMs_[1:17:20],_305bp"  GTTTGGGTGGCATGAGTTTATCAAATAAGAGGGTAAGGCGTGTCAAAAATGACCACCACACAGAGCCCTCCCAGCTCCAGCCGGTTTGCTCCTGGCTCTGCAGATGAACGAGTCAAGATTATTCCAAGCTCAGCAGTGGTAAACAGCCAGGGATTTCTTTCTGAATTCACCGGAGAGCCCAAACTGCCGCCACCAATTGCTGTTTCAGTTTCCTCCGAAGATTATGTTATGTATCTGCCCCTCCCCTGCCTCCTCCAGCCAAGAGGGGACTATATGAACAATGAGATATTGTGCTCTGGTAAGCA   TCAGCAGTGGTAAACAGCCA    NA  NA
"chr10:+121100989_Score=1.5_3MMs_[4:8:9],_292bp"    GATGGAGGATGGGAAGAACAATAATTAGAGGGCCACGGTCACGGGATGCGCACAGGCAGAGCTCCTCAGCGCCTCTCAGATGTGAGGCCGAAGCCTAATTATGAAAAGCTGCTGGGTCGGAAGACAGAGGCTGCTGTCTTGGGACATCAGATGCATAAGTGAGATTACTTTTCAGGATAGTGATAAACAACAGGCGTAAACACCCGAGGGAGGGATGGAAAACAGACTCGTGGTCTCTGATGAGGAGATCAGTACCCAGGTTTCGCTCTCCTTAGGGTGACTTCATCAGTGG    ACAACAGGCGTAAACACCCG    NA  NA
"chr14:+79726708__Score=1.5_3MMs_[1:4:6],_341bp"    CCCTGAGATCAACACTGTCTTCCCACACAAAATGCTCACGCTGCCATTTAATGTCAGGTAAACAGACTTGTACTTAGTAAAAGCTTCGTGGAATTGTTCATCTCTACAGAGGGCAGCCACCAGCAACCTACTGGATCAGGAACCCACGCACCATCAAAGAGGAAAAGCATCCGTGGTAAACACCCGAGGTGATGAACCTGCTCCCAAAGAGCAAAGACAAAAACTAACTCAACCTGCCGCACAGACACACATGCTCGTTCTTTTTTTTTCTTTTTTGGTTTTTCCAGACAGGGTTTCTCTGTATAGCCCTGGCTGTCCTGGAACTCACTTTGTAGACCAGG   GCATCCGTGGTAAACACCCG    NA  NA
"chr13:-81283131__Score=0.8_3MMs_[5:11:20],_241bp"  CTTCTGCAGAGTCAGCTTCTTTGTCATTATATAAGAGTACAGGCACTCCCCCTCAATTTATAATGGAGTCACACCCGAGATAATCCCACAGAAGTGGAAAACACCCTAAGTTGAAGATGTGTTTTGCCTTTCTCAAACATGCTTGGAGCCTTAGCATGCAGTTTAACAAACCCTCTTAACACAAAGCCAGTTTATAATGAGAGCTCAAATATCTTCTGTAAAGTACAGTGCTGAAGTGAGC   NA  NA  NA
"chr10:+83240905__Score=0.6_3MMs_[7:10:19],_310bp"  CATCTAGCTGGTTCCTCCTTTCATTACTTCAATTCATCCACTTTGTGGTGCCACAAAGGGATTTAAAATGTCACAAAGACCGAGGCCACCAATTCCTTACCCTGTGGAGAGATAGACACTGTAGTCACTCAGGACACATTGGTCTCTTAAAGCAGGTCCTGCACAGTCAGGATGCCACAGCAATGCTAAACACCTGCAGCTGGAGTGTTTCTTGCTCGTTACAGTTCTTGACTGCACTGGATAATGTAAAGGTTGGATAATGAGTTGATCTCCGAACTGTTCTGTGGACCAATGAAACTGTAGCAAGCAG  NA  NA  NA

I wasn't incredibly certain what to put for the sgRNA for each amplicon and some are just NA as directed in the manual.

Obviously with the excitement of your wedding approaching, please take your time in responding to enjoy it!

aaiezza commented 8 years ago

So I actually figured out my issue. I'm working on a NAS share that's cifs mounted. So running it locally works fine. Otherwise my major error is that the file handle opener is unhappy about the mounted share and gives the following message at runtime:

/bin/sh: /cvri/miano/amplicon_exp/cspresso/A5_S184/CRISPRessoPooled_on_A5_S184/Miano1-A5_S184_L001.assembled.fastq: No such file or
 directory

It doesn't see it for some reason and then the "demultiplexing" portion of the code resumes anyways and every amplicon returns the same message:

gzip: stdin: unexpected end of file

Very informative troubleshooting. Sadly just one more reason I dislike python so. Strangely enough, this error doesn't happen with the --trim-sequences flag set. It DOES occur afterward however but on the demultiplexed gzipped files.

This leads me to believe the error may exist in both the CRISPRessoCORE.py and CRISPRessoPooledCORE.py files.

This is only inconvenient for now, but at least it works! And it does work well on the local file system.

lucapinello commented 8 years ago

Hi Alex, thanks for the detailed analysis. This was hard to catch/debug.

I have tried before to use crispresso on a network drive but I never had this problem, probably since our network drive is fast enough (but we use nfs and not cifs)

For the first error I don't know why it cannot open properly the file, since I am using the standard call to open a file.

For the second error, the relevant code for the demultiplexing is here:

        #align in unbiased way the reads to the genome
        if RUNNING_MODE=='ONLY_GENOME' or RUNNING_MODE=='AMPLICONS_AND_GENOME':
            info('Aligning reads to the provided genome index...')
            bam_filename_genome = _jp('%s_GENOME_ALIGNED.bam' % database_id)
            aligner_command= 'bowtie2 -x %s -p %s -k 1 --end-to-end -N 0 --np 0 -U %s 2>>%s| samtools view -bS - > %s' %(args.bowtie2_index,args.n_processes,processed_output_filename,log_filename,bam_filename_genome)
            sb.call(aligner_command,shell=True)

            N_READS_ALIGNED=get_n_aligned_bam(bam_filename_genome)

            #REDISCOVER LOCATIONS and DEMULTIPLEX READS
            MAPPED_REGIONS=_jp('MAPPED_REGIONS/')
            if not os.path.exists(MAPPED_REGIONS):
                os.mkdir(MAPPED_REGIONS)

            s1=r'''samtools view -F 0x0004 %s 2>>%s |''' % (bam_filename_genome,log_filename)+\
            r'''awk '{OFS="\t"; bpstart=$4;  bpend=bpstart; split ($6,a,"[MIDNSHP]"); n=0;\
            for (i=1; i<=length(a); i++){\
                n+=1+length(a[i]);\
                if (substr($6,n,1)=="S"){\
                    if (bpend==$4)\
                        bpstart-=a[i];\
                    else
                        bpend+=a[i];
                    }\
                else if( (substr($6,n,1)!="I")  && (substr($6,n,1)!="H") )\
                        bpend+=a[i];\
                }\
                if ( ($2 % 32)>=16)\
                    print $3,bpstart,bpend,"-",$1,$10,$11;\
                else\
                    print $3,bpstart,bpend,"+",$1,$10,$11;}' | ''' 

            s2=r'''  sort -k1,1 -k2,2n  | awk \
            'BEGIN{chr_id="NA";bpstart=-1;bpend=-1; fastq_filename="NA"}\
            { if ( (chr_id!=$1) || (bpstart!=$2) || (bpend!=$3) )\
                {\
                if (fastq_filename!="NA") {close(fastq_filename); system("gzip "fastq_filename)}\
                chr_id=$1; bpstart=$2; bpend=$3;\
                fastq_filename=sprintf("__OUTPUTPATH__REGION_%s_%s_%s.fastq",$1,$2,$3);\
                }\
            print "@"$5"\n"$6"\n+\n"$7 >> fastq_filename;\
            }' '''
            cmd=s1+s2.replace('__OUTPUTPATH__',MAPPED_REGIONS)

            info('Demultiplexing reads by location...')
            sb.call(cmd,shell=True)

            #gzip the missing ones 
            sb.call('gzip %s/*.fastq' % MAPPED_REGIONS,shell=True)

As you can see the final call is compressing demultiplexed reads by location, but since the network drive is probably too slow in your case it may be not synced by the time I am trying to compress the files in the last line or to use it later. One monkey patch to cover this case would be to add a delay command before and after the last line to allow the cifs share to be in sync.

For example this will wait for 60 secs:

time.sleep( 60)

Hope this is helpful.