lehner-lab / DiMSum

An error model and pipeline for analyzing deep mutational scanning (DMS) data and diagnosing common experimental pathologies
MIT License
28 stars 6 forks source link

"WT variant not found. Did you mean to specify one of the following?" #7

Closed carolinelangley closed 2 years ago

carolinelangley commented 2 years ago

Hello, I am trying to process some reads that do not have any barcodes. The command I am running is:

"DiMSum --fastqFileDir fastq --experimentDesignPath experimentaldesign.txt --wildtypeSequence atttcaggtgtcgtgagcggccgcATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAAGTGGACAGGATGAAGATTAAAACATGGAAAAGTTTAGTAAAGCATCATATGTATGTTTCAAAGAAGGCTAGGAGATGGTTTTATAGACATCACTATGAAAGCACTCATCCAAAAATAAGTTCAGAAGTACACATCCCACTAGAGAAGGGTGAATTGGTAGTAACAACATATTGGGGTCTGCATACAGGAGAAAGAGACTGGCATTTGGGTCAGGGAGTCTCCATAGAATGGAGGAAAGGGAGATATAGCACACAAGTAGACCCTGACCTAGCAGACCAACTAATTCATCTGTATTACTTTGACTGTTTTTCAGAATCTGCT --mixedSubstitutions T --mutagenesisType codon"

Every time, I keep getting this error:

DiMSum STAGE 4 (STEAM): PROCESS VARIANT SEQUENCES

Loading variant count files: ./DiMSum_Project/tmp/3_tally/A3G1203preselection_e1_s0_bNA_t1.vsearch.unique ./DiMSum_Project/tmp/3_tally/A3G1203postselection_e1_s1_b1_t1.vsearch.unique Processing... A3G1203preselection_e1_s0_bNA_t1.vsearch.unique A3G1203postselection_e1_s1_b1_t1.vsearch.unique Processing merged variants... WT variant not found. Did you mean to specify one of the following?

caught segfault address (nil), cause 'unknown'

Traceback: 1: is.sorted(jval, by = key(x)) 2: [.data.table(variant_dt[all_reads == T, ], order(mean_count, decreasing = T)[1:5], .(nt_seq = toupper(nt_seq), all_reads, mean_count)) 3: variant_dt[all_reads == T, ][order(mean_count, decreasing = T)[1:5], .(nt_seq = toupper(nt_seq), all_reads, mean_count)] 4: print(variant_dt[all_reads == T, ][order(mean_count, decreasing = T)[1:5], .(nt_seq = toupper(nt_seq), all_reads, mean_count)]) 5: dimsum__process_merged_variants(dimsum_meta = dimsum_meta, input_dt = variant_data_merge) 6: dimsum_stage_merge(dimsum_meta = pipeline[["3_tally"]], merge_outpath = pipeline[["3_tally"]][["project_path"]], report_outpath = file.path(pipeline[["3_tally"]][["project_path"]], "reports")) 7: dimsum(runDemo = arg_list[["runDemo"]], fastqFileDir = arg_list[["fastqFileDir"]], fastqFileExtension = arg_list[["fastqFileExtension"]], gzipped = arg_list[["gzipped"]], stranded = arg_list[["stranded"]], paired = arg_list[["paired"]], barcodeDesignPath = arg_list[["barcodeDesignPath"]], barcodeErrorRate = arg_list[["barcodeErrorRate"]], experimentDesignPath = arg_list[["experimentDesignPath"]], experimentDesignPairDuplicates = arg_list[["experimentDesignPairDuplicates"]], barcodeIdentityPath = arg_list[["barcodeIdentityPath"]], countPath = arg_list[["countPath"]], cutadaptCut5First = arg_list[["cutadaptCut5First"]], cutadaptCut5Second = arg_list[["cutadaptCut5Second"]], cutadaptCut3First = arg_list[["cutadaptCut3First"]], cutadaptCut3Second = arg_list[["cutadaptCut3Second"]], cutadapt5First = arg_list[["cutadapt5First"]], cutadapt5Second = arg_list[["cutadapt5Second"]], cutadapt3First = arg_list[["cutadapt3First"]], cutadapt3Second = arg_list[["cutadapt3Second"]], cutadaptMinLength = arg_list[["cutadaptMinLength"]], cutadaptErrorRate = arg_list[["cutadaptErrorRate"]], cutadaptOverlap = arg_list[["cutadaptOverlap"]], vsearchMinQual = arg_list[["vsearchMinQual"]], vsearchMaxee = arg_list[["vsearchMaxee"]], vsearchMinovlen = arg_list[["vsearchMinovlen"]], outputPath = arg_list[["outputPath"]], projectName = arg_list[["projectName"]], wildtypeSequence = arg_list[["wildtypeSequence"]], permittedSequences = arg_list[["permittedSequences"]], reverseComplement = arg_list[["reverseComplement"]], sequenceType = arg_list[["sequenceType"]], mutagenesisType = arg_list[["mutagenesisType"]], transLibrary = arg_list[["transLibrary"]], transLibraryReverseComplement = arg_list[["transLibraryReverseComplement"]], bayesianDoubleFitness = arg_list[["bayesianDoubleFitness"]], bayesianDoubleFitnessLamD = arg_list[["bayesianDoubleFitnessLamD"]], fitnessMinInputCountAll = arg_list[["fitnessMinInputCountAll"]], fitnessMinInputCountAny = arg_list[["fitnessMinInputCountAny"]], fitnessMinOutputCountAll = arg_list[["fitnessMinOutputCountAll"]], fitnessMinOutputCountAny = arg_list[["fitnessMinOutputCountAny"]], fitnessHighConfidenceCount = arg_list[["fitnessHighConfidenceCount"]], fitnessDoubleHighConfidenceCount = arg_list[["fitnessDoubleHighConfidenceCount"]], fitnessNormalise = arg_list[["fitnessNormalise"]], fitnessErrorModel = arg_list[["fitnessErrorModel"]], indels = arg_list[["indels"]], maxSubstitutions = arg_list[["maxSubstitutions"]], mixedSubstitutions = arg_list[["mixedSubstitutions"]], retainIntermediateFiles = arg_list[["retainIntermediateFiles"]], splitChunkSize = arg_list[["splitChunkSize"]], retainedReplicates = arg_list[["retainedReplicates"]], startStage = arg_list[["startStage"]], stopStage = arg_list[["stopStage"]], numCores = arg_list[["numCores"]]) An irrecoverable exception occurred. R is aborting now ... Segmentation fault (core dumped)

Even if I fiddle around with the max mutations allowed, I still get this error. Does anyone know how to resolve this??

andrefaure commented 2 years ago

Hi @carolinelangley sorry you're struggling to get it to work with your data.

carolinelangley commented 2 years ago

Hi @andrefaure thanks for the reply!

I am running the most recent version, followed the installation instructions to a T. I do not have biological replicates for these samples. The data is paired end, 250 bp in both directions, and was read using Illumina. The total size of the gene is 381 bp, and there is a 60bp stretch of constant region on the 5' end before the my DMS mutagenesis starts. These samples are not barcoded.

andrefaure commented 2 years ago

Hi @carolinelangley,

I think the issue is that currently very few of your reads (if any) are making it to stage 4. (I'm not sure why DiMSum isn't throwing a more informative error but I'll look into this).

The most likely reason for this is that you haven't specified the TRIM arguments: https://github.com/lehner-lab/DiMSum/blob/master/docs/ARGUMENTS.md#trim-arguments

Lastly it's always a good idea to directly inspect a fragment of the raw FASTQ files (e.g. in a text editor) before running DiMSum to check that the sequences indeed match your expectation...

Let me know if you're still having issues after troubleshooting these things!

carolinelangley commented 2 years ago

Hi @andrefaure I really appreciate all of the help, now I'm running into this issue: There were problems while running 'dimsum__cutadapt_report' Error in dimsum_stage_cutadapt(dimsum_meta = pipeline[["1_split"]], cutadapt_outpath = file.path(pipeline[["1_split"]][["tmp_path"]], : object 'dimsum_meta_new_report' not found Calls: dimsum -> dimsum_stage_cutadapt Execution halted (dimsum) clangley@gizmok45:~/dimsum$

andrefaure commented 2 years ago

Hi @carolinelangley are you running DiMSum on Linux or Mac? Please also send me the result when running the following on the command line (with dimsum conda environment activated): conda list r-cairo

carolinelangley commented 2 years ago

Hi @andrefaure My computer is a Mac but I have been using a remote server to run the software. (dimsum) clangley@gizmok45:~/dimsum$ conda list r-cairo

packages in environment at /home/clangley/miniconda3/envs/dimsum:

#

Name Version Build Channel

r-cairo 1.6_0 r41h06615bd_0 conda-forge

andrefaure commented 2 years ago

Ok it seems that r-cairo 1.6 breaks pandoc on linux. You will need to downgrade to 1.5 as follows (again make sure the dimsum conda environment is activated): conda install -c conda-forge r-cairo=1.5

Then try running DiMSum again (no need to reinstall) and let me know if that fixes it!

carolinelangley commented 2 years ago

@andrefaure Hmm I am still getting the same error as above.

andrefaure commented 2 years ago

Ok could you please run the full demo to double-check that you have a working installation? https://github.com/lehner-lab/DiMSum/blob/master/docs/DEMO.md#full-dimsum-demo-wrapsteam

If that works fine then the best would be if you could share your experiment design file and (a small fragment e.g. 100 reads of each of) the FASTQs you're using so that I can try to reproduce the error. If the FASTQ fragments are too large to attach here then you could try google drive/ dropbox/ ftp or some other file sharing service.

carolinelangley commented 2 years ago

I'm having the same error with the demo, I'm going to uninstall and reinstall the software and see if that helps.

andrefaure commented 2 years ago

Yes the easiest is to simply delete the dimsum environment: conda env remove --name dimsum and then reinstall from scratch as before.

If the demo still doesn't work try downgrading r-cairo to version 1.5 as explained above (and double-check this worked by running conda list r-cairo).

(Finally, whenever running the DiMSum demo or on your own data, make sure you activate the dimsum conda environment first.)

carolinelangley commented 2 years ago

I am still getting the error while running the Demo even after reinstallation.

andrefaure commented 2 years ago

Do you have r-cairo 1.5 or 1.6?

carolinelangley commented 2 years ago

1.5

andrefaure commented 2 years ago

One last thing - please also try downgrading pandoc to 2.16 as follows: conda install -c conda-forge pandoc=2.16

carolinelangley commented 2 years ago

The error is still persisting.

andrefaure commented 2 years ago

It must be a conflict between some of the dependencies - could you please send the full output of the following command: conda list and I will try to figure this out when I get the chance.

Thanks and sorry for the trouble!

carolinelangley commented 2 years ago
Screen Shot 2022-07-10 at 1 50 14 PM
andrefaure commented 2 years ago

@carolinelangley it looks like you have no packages installed in the 'dimsum' conda environment... You must have installed them in the default ('base') env by mistake. If you do which DiMSum it will tell you where DiMSum is installed. Anyway please deactivate the dimsum env and send me the output of conda list as follows:

deactivate dimsum
conda list
carolinelangley commented 2 years ago

@andrefaure clangley@gizmok1:~/dimsum$ conda list

packages in environment at /home/clangley/miniconda3:

#

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
_r-mutex 1.0.1 anacondar_1 conda-forge alsa-lib 1.2.3 h516909a_0 conda-forge binutils_impl_linux-64 2.35.1 h27ae35d_9
binutils_linux-64 2.35 h67ddf6f_30 conda-forge bioconductor-biobase 2.52.0 r41hd029910_0 bioconda bioconductor-biocgenerics 0.38.0 r41hdfd78af_0 bioconda bioconductor-biocparallel 1.26.0 r41h399db7b_0 bioconda bioconductor-biostrings 2.60.0 r41hd029910_0 bioconda bioconductor-delayedarray 0.18.0 r41hd029910_0 bioconda bioconductor-genomeinfodb 1.28.0 r41hdfd78af_0 bioconda bioconductor-genomeinfodbdata 1.2.6 r41hdfd78af_0 bioconda bioconductor-genomicalignments 1.28.0 r41hd029910_0 bioconda bioconductor-genomicranges 1.44.0 r41hd029910_0 bioconda bioconductor-iranges 2.26.0 r41hd029910_0 bioconda bioconductor-matrixgenerics 1.4.0 r41hdfd78af_0 bioconda bioconductor-rhtslib 1.24.0 r41hd029910_0 bioconda bioconductor-rsamtools 2.8.0 r41h399db7b_0 bioconda bioconductor-s4vectors 0.30.0 r41hd029910_0 bioconda bioconductor-shortread 1.50.0 r41h399db7b_0 bioconda bioconductor-summarizedexperiment 1.22.0 r41hdfd78af_0 bioconda bioconductor-xvector 0.32.0 r41hd029910_0 bioconda bioconductor-zlibbioc 1.38.0 r41hd029910_0 bioconda brotlipy 0.7.0 py39h27cfd23_1003
bwidget 1.9.14 ha770c72_1 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.17.1 h7f98852_1 conda-forge ca-certificates 2022.6.15 ha878542_0 conda-forge cairo 1.16.0 h6cf1ce9_1008 conda-forge certifi 2022.6.15 py39hf3d152e_0 conda-forge cffi 1.15.0 py39hd667e15_1
charset-normalizer 2.0.4 pyhd3eb1b0_0
colorama 0.4.4 pyhd3eb1b0_0
conda 4.13.0 py39hf3d152e_1 conda-forge conda-content-trust 0.1.1 pyhd3eb1b0_0
conda-package-handling 1.8.1 py39h7f8727e_0
cryptography 36.0.0 py39h9ce1e76_0
curl 7.78.0 hea6ffbf_0 conda-forge cutadapt 3.4 py39h38f01e4_1 bioconda dnaio 0.5.1 py39h38f01e4_0 bioconda fastqc 0.11.9 hdfd78af_1 bioconda font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 hab24e00_0 conda-forge fontconfig 2.13.1 hba837de_1005 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge freetype 2.10.4 h0708190_1 conda-forge fribidi 1.0.10 h36c2ea0_0 conda-forge gcc_impl_linux-64 9.3.0 h70c0ae5_19 conda-forge gcc_linux-64 9.3.0 hf25ea35_30 conda-forge gettext 0.19.8.1 h0b5b191_1005 conda-forge gfortran_impl_linux-64 9.3.0 hc4a2995_19 conda-forge gfortran_linux-64 9.3.0 hdc58fab_30 conda-forge giflib 5.2.1 h36c2ea0_2 conda-forge graphite2 1.3.13 h58526e2_1001 conda-forge gsl 2.6 he838d99_2 conda-forge gxx_impl_linux-64 9.3.0 hd87eabc_19 conda-forge gxx_linux-64 9.3.0 h3fbe746_30 conda-forge harfbuzz 2.8.2 h83ec7ef_0 conda-forge icu 68.1 h58526e2_0 conda-forge idna 3.3 pyhd3eb1b0_0
isa-l 2.30.0 ha770c72_4 conda-forge jbig 2.1 h7f98852_2003 conda-forge jpeg 9d h36c2ea0_0 conda-forge kernel-headers_linux-64 2.6.32 he073ed8_15 conda-forge krb5 1.19.2 hcc1bbae_0 conda-forge lcms2 2.12 hddcbb42_0 conda-forge ld_impl_linux-64 2.35.1 h7274673_9
lerc 2.2.1 h9c3ff4c_0 conda-forge libblas 3.9.0 11_linux64_openblas conda-forge libcblas 3.9.0 11_linux64_openblas conda-forge libcurl 7.78.0 h2574ce0_0 conda-forge libdeflate 1.7 h7f98852_5 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libffi 3.3 he6710b0_2
libgcc-devel_linux-64 9.3.0 h7864c58_19 conda-forge libgcc-ng 9.3.0 h5101ec6_17
libgfortran-ng 9.4.0 h69a702a_16 conda-forge libgfortran5 9.4.0 h62347ff_16 conda-forge libglib 2.68.3 h3e27bee_0 conda-forge libgomp 9.3.0 h5101ec6_17
libiconv 1.16 h516909a_0 conda-forge liblapack 3.9.0 11_linux64_openblas conda-forge libnghttp2 1.43.0 h812cca2_0 conda-forge libopenblas 0.3.17 pthreads_h8fe5266_1 conda-forge libpng 1.6.37 h21135ba_2 conda-forge libssh2 1.9.0 ha56f1ee_6 conda-forge libstdcxx-devel_linux-64 9.3.0 hb016644_19 conda-forge libstdcxx-ng 9.3.0 hd4cf53a_17
libtiff 4.3.0 hf544144_1 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libwebp-base 1.2.0 h7f98852_2 conda-forge libxcb 1.13 h7f98852_1003 conda-forge libxml2 2.9.12 h72842e0_0 conda-forge lz4-c 1.9.3 h9c3ff4c_1 conda-forge make 4.3 hd18ef5c_1 conda-forge ncurses 6.3 h7f8727e_2
openjdk 11.0.9.1 h5cc2fde_1 conda-forge openssl 1.1.1o h7f8727e_0
pandoc 2.16.2 h7f98852_0 conda-forge pango 1.48.7 hb8ff022_0 conda-forge pbzip2 1.1.13 0 conda-forge pcre 8.45 h9c3ff4c_0 conda-forge pcre2 10.37 h032f7d1_0 conda-forge perl 5.32.1 0_h7f98852_perl5 conda-forge pigz 2.6 h27826a3_0 conda-forge pip 21.2.4 py39h06a4308_0
pixman 0.40.0 h36c2ea0_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge pycosat 0.6.3 py39h27cfd23_0
pycparser 2.21 pyhd3eb1b0_0
pyopenssl 22.0.0 pyhd3eb1b0_0
pysocks 1.7.1 py39h06a4308_0
python 3.9.12 h12debd9_0
python-isal 0.11.0 py39h3811e60_0 conda-forge python_abi 3.9 2_cp39 conda-forge r-ade4 1.7_17 r41he454529_0 conda-forge r-assertthat 0.2.1 r41hc72bb7e_2 conda-forge r-backports 1.2.1 r41hcfec24a_0 conda-forge r-base 4.1.0 hb67fd72_2 conda-forge r-base64enc 0.1_3 r41hcfec24a_1004 conda-forge r-bh 1.78.0_0 r41hc72bb7e_0 conda-forge r-bitops 1.0_7 r41hcfec24a_0 conda-forge r-brio 1.1.2 r41hcfec24a_0 conda-forge r-cairo 1.5_12.2 r41hcfec24a_0 conda-forge r-callr 3.7.0 r41hc72bb7e_0 conda-forge r-cli 3.0.1 r41h03ef668_1 conda-forge r-colorspace 2.0_2 r41hcfec24a_0 conda-forge r-cowplot 1.1.1 r41hc72bb7e_0 conda-forge r-crayon 1.5.1 r41hc72bb7e_0 conda-forge r-data.table 1.14.0 r41hcfec24a_0 conda-forge r-desc 1.4.1 r41hc72bb7e_0 conda-forge r-diffobj 0.3.4 r41hcfec24a_0 conda-forge r-digest 0.6.27 r41h03ef668_0 conda-forge r-dimsum 1.2.8 r41hdfd78af_0 bioconda r-dplyr 1.0.7 r41h03ef668_0 conda-forge r-ellipsis 0.3.2 r41hcfec24a_0 conda-forge r-evaluate 0.15 r41hc72bb7e_0 conda-forge r-fansi 0.4.2 r41hcfec24a_0 conda-forge r-farver 2.1.0 r41h03ef668_0 conda-forge r-forcats 0.5.1 r41hc72bb7e_0 conda-forge r-formatr 1.12 r41hc72bb7e_0 conda-forge r-futile.logger 1.4.3 r41hc72bb7e_1003 conda-forge r-futile.options 1.0.1 r41hc72bb7e_1002 conda-forge r-generics 0.1.3 r41hc72bb7e_0 conda-forge r-getopt 1.20.3 r41ha770c72_2 conda-forge r-ggally 2.1.2 r41hc72bb7e_0 conda-forge r-ggplot2 3.3.6 r41hc72bb7e_0 conda-forge r-glue 1.4.2 r41hcfec24a_0 conda-forge r-gridextra 2.3 r41hc72bb7e_1003 conda-forge r-gtable 0.3.0 r41hc72bb7e_3 conda-forge r-hexbin 1.28.2 r41h859d828_0 conda-forge r-highr 0.9 r41hc72bb7e_0 conda-forge r-hms 1.1.1 r41hc72bb7e_0 conda-forge r-htmltools 0.5.1.1 r41h03ef668_0 conda-forge r-hwriter 1.3.2.1 r41hc72bb7e_0 conda-forge r-isoband 0.2.5 r41h03ef668_0 conda-forge r-jpeg 0.1_8.1 r41hcfec24a_1 conda-forge r-jquerylib 0.1.4 r41hc72bb7e_0 conda-forge r-jsonlite 1.7.2 r41hcfec24a_0 conda-forge r-knitr 1.37 r41hc72bb7e_0 conda-forge r-labeling 0.4.2 r41hc72bb7e_1 conda-forge r-lambda.r 1.2.4 r41hc72bb7e_1 conda-forge r-lattice 0.20_44 r41hcfec24a_0 conda-forge r-latticeextra 0.6_29 r41hc72bb7e_1 conda-forge r-lifecycle 1.0.1 r41hc72bb7e_0 conda-forge r-magrittr 2.0.1 r41hcfec24a_1 conda-forge r-markdown 1.1 r41hcfec24a_1 conda-forge r-mass 7.3_54 r41hcfec24a_0 conda-forge r-matrix 1.3_4 r41he454529_0 conda-forge r-matrixstats 0.60.0 r41hcfec24a_0 conda-forge r-mgcv 1.8_36 r41he454529_0 conda-forge r-mime 0.11 r41hcfec24a_0 conda-forge r-munsell 0.5.0 r41hc72bb7e_1004 conda-forge r-nlme 3.1_152 r41h859d828_0 conda-forge r-optparse 1.7.1 r41hc72bb7e_0 conda-forge r-pillar 1.7.0 r41hc72bb7e_0 conda-forge r-pixmap 0.4_12 r41hc72bb7e_0 conda-forge r-pkgconfig 2.0.3 r41hc72bb7e_1 conda-forge r-pkgload 1.2.1 r41h03ef668_0 conda-forge r-plyr 1.8.6 r41h03ef668_1 conda-forge r-png 0.1_7 r41hcfec24a_1004 conda-forge r-praise 1.0.0 r41hc72bb7e_1005 conda-forge r-prettyunits 1.1.1 r41hc72bb7e_1 conda-forge r-processx 3.5.2 r41hcfec24a_0 conda-forge r-progress 1.2.2 r41hc72bb7e_2 conda-forge r-ps 1.6.0 r41hcfec24a_0 conda-forge r-purrr 0.3.4 r41hcfec24a_1 conda-forge r-r6 2.5.1 r41hc72bb7e_0 conda-forge r-rcolorbrewer 1.1_3 r41h785f33e_0 conda-forge r-rcpp 1.0.7 r41h03ef668_0 conda-forge r-rcurl 1.98_1.3 r41hcfec24a_0 conda-forge r-rematch2 2.1.2 r41hc72bb7e_1 conda-forge r-reshape 0.8.9 r41hc72bb7e_0 conda-forge r-reshape2 1.4.4 r41h03ef668_1 conda-forge r-rlang 0.4.11 r41hcfec24a_0 conda-forge r-rmarkdown 2.11 r41hc72bb7e_1 conda-forge r-rprojroot 2.0.3 r41hc72bb7e_0 conda-forge r-rstudioapi 0.13 r41hc72bb7e_0 conda-forge r-scales 1.2.0 r41hc72bb7e_0 conda-forge r-segmented 1.5_0 r41hc72bb7e_0 conda-forge r-seqinr 4.2_8 r41hcfec24a_0 conda-forge r-snow 0.4_4 r41hc72bb7e_0 conda-forge r-sp 1.4_5 r41hcfec24a_0 conda-forge r-stringi 1.7.3 r41hcabe038_0 conda-forge r-stringr 1.4.0 r41hc72bb7e_2 conda-forge r-testthat 3.0.4 r41h03ef668_0 conda-forge r-tibble 3.1.3 r41hcfec24a_0 conda-forge r-tidyr 1.1.3 r41h03ef668_0 conda-forge r-tidyselect 1.1.2 r41hc72bb7e_0 conda-forge r-tinytex 0.40 r41hc72bb7e_0 conda-forge r-utf8 1.2.2 r41hcfec24a_0 conda-forge r-vctrs 0.3.8 r41hcfec24a_1 conda-forge r-viridislite 0.4.0 r41hc72bb7e_0 conda-forge r-waldo 0.3.1 r41hc72bb7e_0 conda-forge r-withr 2.5.0 r41hc72bb7e_0 conda-forge r-xfun 0.24 r41h03ef668_0 conda-forge r-yaml 2.2.1 r41hcfec24a_1 conda-forge readline 8.1.2 h7f8727e_1
requests 2.27.1 pyhd3eb1b0_0
ruamel_yaml 0.15.100 py39h27cfd23_0
sed 4.8 he412f7d_0 conda-forge setuptools 61.2.0 py39h06a4308_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.38.2 hc218d9a_0
starcode 1.4 h779adbc_1 bioconda sysroot_linux-64 2.12 he073ed8_15 conda-forge tk 8.6.11 h1ccaba5_0
tktable 2.10 hb7b940f_3 conda-forge tqdm 4.63.0 pyhd3eb1b0_0
tzdata 2022a hda174b7_0
urllib3 1.26.8 pyhd3eb1b0_0
vsearch 2.17.1 h95f258a_0 bioconda wheel 0.37.1 pyhd3eb1b0_0
xopen 1.5.0 py39hf3d152e_0 conda-forge xorg-fixesproto 5.0 h7f98852_1002 conda-forge xorg-inputproto 2.3.2 h7f98852_1002 conda-forge xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.0.10 h7f98852_0 conda-forge xorg-libsm 1.2.3 hd9c2040_1000 conda-forge xorg-libx11 1.7.2 h7f98852_0 conda-forge xorg-libxau 1.0.9 h7f98852_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h7f98852_1 conda-forge xorg-libxfixes 5.0.3 h7f98852_1004 conda-forge xorg-libxi 1.7.10 h7f98852_0 conda-forge xorg-libxrender 0.9.10 h7f98852_1003 conda-forge xorg-libxt 1.2.1 h7f98852_2 conda-forge xorg-libxtst 1.2.3 h7f98852_1002 conda-forge xorg-recordproto 1.14.2 h7f98852_1002 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h7f98852_1002 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xz 5.2.5 h7b6447c_0
yaml 0.2.5 h7b6447c_0
zlib 1.2.12 h7f8727e_1
zstd 1.5.0 ha95c52a_0 conda-forge clangley@gizmok1:~/dimsum$

carolinelangley commented 2 years ago

Ok I finally got the demo working, you were completely correct. Thanks for bearing with me. Now I continue to run into the issue I originally posted about about the WT variant not being found.

Of the region sequenced, the nucleotides encoding the first 63 base pairs are constant, and then after that each position is mutated to encode every possible amino acid. The 3' does not have any constant region, as the primers I used to add Illumina UMIs immediately followed a mutagenized codon.

andrefaure commented 2 years ago

Great! I suggest doing a simple 'grep' for your WT variant in your raw FASTQ files to reassure yourself that the design is as you expect. If you do find it, could you paste a few lines of the result here to verify (ideally full FASTQ record = 4lines with qualities)?

You can also try manually searching for the WT variant in the following temp files (from stage 3):

./DiMSum_Project/tmp/3_tally/A3G1203preselection_e1_s0_bNA_t1.vsearch.unique
./DiMSum_Project/tmp/3_tally/A3G1203postselection_e1_s1_b1_t1.vsearch.unique

If it is indeed not in these files, the WT variant is either not present in the original FASTQ files or has been filtered out by one of the previous stages e.g. no constant region match found, poor base quality, no alignment etc.

carolinelangley commented 2 years ago

Hmm I see my WT sequence represented and I also added a --cutadapt5Second argument. All of my reads are getting trimmed, but less than 50% of them are aligning as they are getting thrown out in Stage 2.

carolinelangley commented 2 years ago

fastq.zip This is the exact command I've been running and this is some example data. I'm really not sure I've been trying different parameters in the arguments and still get the same error. experimentaldesign2.txt

carolinelangley commented 2 years ago

DiMSum --fastqFileDir fastq --experimentDesignPath experimentaldesign2.txt --wildtypeSequence GTGGACAGGATGAAGATTAAAACATGGAAAAGTTTAGTAAAGCATCATATGTATGTTTCAAAGAAGGCTAGGAGATGGTTTTATAGACATCACTATGAAAGCACTCATCCAAAAATAAGTTCAGAAGTACACATCCCACTAGAGAAGGGTGAATTGGTAGTAACAACATATTGGGGTCTGCATACAGGAGAAAGAGACTGGCATTTGGGTCAGGGAGTCTCCATAGAATGGAGGAAAGGGAGATATAGCACACAAGTAGACCCTGACCTAGCAGACCAACTAATTCATCTGTATTACTTTGACTGTTTTTCA --mixedSubstitutions T --mutagenesisType codon --cutadapt5First='ATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAA;required...GACTCTGCT;optional' --vsearchMaxee 0.8 --cutadaptErrorRate 0.8

andrefaure commented 2 years ago

The following works for me i.e. the WT is amongst the most abundant in the filtered variants. You need to include the 3' constant region=GACTCTGCT as optional in the first read (it is normally not sequenced for variants matching the WT length=312) and required for the second read in the pair (present in 96% of reads in the FASTQs you shared):

DiMSum --fastqFileDir fastq --experimentDesignPath experimentaldesign2.txt --wildtypeSequence GTGGACAGGATGAAGATTAAAACATGGAAAAGTTTAGTAAAGCATCATATGTATGTTTCAAAGAAGGCTAGGAGATGGTTTTATAGACATCACTATGAAAGCACTCATCCAAAAATAAGTTCAGAAGTACACATCCCACTAGAGAAGGGTGAATTGGTAGTAACAACATATTGGGGTCTGCATACAGGAGAAAGAGACTGGCATTTGGGTCAGGGAGTCTCCATAGAATGGAGGAAAGGGAGATATAGCACACAAGTAGACCCTGACCTAGCAGACCAACTAATTCATCTGTATTACTTTGACTGTTTTTCA --mixedSubstitutions T --mutagenesisType codon --cutadapt5First='ATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAA;required...GACTCTGCT;optional' --cutadapt5Second='AGCAGAGTC;required...TTGCCACACAATCATCACCTGCCATCTGTTTTCCAT;optional' --retainIntermediateFiles T

By the way, setting the cutadapt error rate ('cutadaptErrorRate') to 80% is not advisable because you will get spurious matches.

I'm closing this issue now that you have a working installation of DiMSum.