pinellolab / CRISPR-Correct

Perform CRISPR guide mapping and analysis that considers protospace self-editing and surrogate sensor sequences. Self-editing aware mapping takes a hamming-distance approach unbiased by expectations in editing patterns.
GNU Affero General Public License v3.0
2 stars 0 forks source link

Run CRISP-Correct with R1 fastq only #3

Open Edert opened 4 days ago

Edert commented 4 days ago

Hi,

I am trying to run CRISPR-Correct with just one fastq file (R1 only). According to the readme it should be possible to do so. How should I define the fastq_r2_fn parameter as it seems to be required? None, NULL, or an empty string '' does not work.

My python lines:

import crispr_ambiguous_mapping
import pandas as pd
import gzip
import shutil

PROTOSPACER_HAMMING_THRESHOLD = 7
CPUS=16

GUIDE_LIBRARY_DATAFRAME = pd.read_table("ref/lib1_for_CRISPR_correct.txt")

GUIDE_LIBRARY_DATAFRAME
                protospacer
0      GCCTCTGCCTGGTCTGTGGG
1      TGGTCTGTGGGGACGTGGCC
2      CATCCTGTGAGGCCTGCAAA
3      AGGCCTGCAAAGCCTTCTTC
4      ACAGCTGTCCGGCCTCCAAC
...                     ...
56100  TGGGGGTGTTCTGCTGGTAG
56101  TGGTTGTCGGGCAGCAGCAC
56102  TGTACTCCAGCTTGTGCCCC
56103  TGTGATCGCGCTTCTCGTTG
56104  TTCAAGTCCGCCATGCCCGA

[56105 rows x 1 columns]

INPUT_GZ_FILE= "PlasmidPool_R1.fastq.gz"
FASTQ_FILE ="PlasmidPool_R1.fastq" 

#unzip
with gzip.open(INPUT_GZ_FILE, 'rb') as gz_file:
    with open(FASTQ_FILE, 'wb') as extracted_file:
        shutil.copyfileobj(gz_file, extracted_file)

count_result =  crispr_ambiguous_mapping.mapping.get_whitelist_reporter_counts_from_umitools_output(whitelist_guide_reporter_df=GUIDE_LIBRARY_DATAFRAME, fastq_r1_fn=FASTQ_FILE, fastq_r2_fn='None',protospacer_hamming_threshold_strict=PROTOSPACER_HAMMING_THRESHOLD,cores=CPUS)

and this is my fastq file:

head PlasmidPool_R1.fastq
@LH00392:128:22F7GGLT4:3:1101:41340:1042 1:N:0:ATACCTGT
CATGCAAGACAGGTCACAAG
+
IIIIIIIIIIIIIIIIIIII
@LH00392:128:22F7GGLT4:3:1101:42100:1042 1:N:0:ATACCTGT
GTTCTGGACATTCACCATCC
+
IIIIIIIIIIIIIIIIIIII
@LH00392:128:22F7GGLT4:3:1101:43152:1042 1:N:0:ATACCTGT
GTGCTCTAGCTGTCAAGCTT

Best regards, Thomas

CodingBash commented 10 hours ago

Hi Thomas,

I just updated the repository, please see the updated README and install the latest version of CRISPR-Correct: pip install crispr-ambiguous-mapping==0.0.177

Here is a draft of what the function should look like with R1 only:

result = crispr_ambiguous_mapping.mapping.get_whitelist_reporter_counts_from_fastq(
       whitelist_guide_reporter_df=GUIDE_LIBRARY_DATAFRAME, 
       fastq_r1_fn=INPUT_GZ_FILE,  # Tool accepts GZ files 

       protospacer_start_position = 0,
       protospacer_length = 20,

       is_protospacer_r1 = True, 
       is_protospacer_header = False, 
       revcomp_protospacer = False, 
       protospacer_hamming_threshold_strict=7,
       cores=CPUS)

The updated version isn't as robustly tested so let me know if you still have issues.