Closed carolinelangley closed 2 years ago
Hi @carolinelangley sorry you're struggling to get it to work with your data.
Hi @andrefaure thanks for the reply!
I am running the most recent version, followed the installation instructions to a T. I do not have biological replicates for these samples. The data is paired end, 250 bp in both directions, and was read using Illumina. The total size of the gene is 381 bp, and there is a 60bp stretch of constant region on the 5' end before the my DMS mutagenesis starts. These samples are not barcoded.
Hi @carolinelangley,
I think the issue is that currently very few of your reads (if any) are making it to stage 4. (I'm not sure why DiMSum isn't throwing a more informative error but I'll look into this).
The most likely reason for this is that you haven't specified the TRIM arguments: https://github.com/lehner-lab/DiMSum/blob/master/docs/ARGUMENTS.md#trim-arguments
DiMSum --fastqFileDir fastq --experimentDesignPath experimentaldesign.txt --wildtypeSequence ATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAAGTGGACAGGATGAAGATTAAAACATGGAAAAGTTTAGTAAAGCATCATATGTATGTTTCAAAGAAGGCTAGGAGATGGTTTTATAGACATCACTATGAAAGCACTCATCCAAAAATAAGTTCAGAAGTACACATCCCACTAGAGAAGGGTGAATTGGTAGTAACAACATATTGGGGTCTGCATACAGGAGAAAGAGACTGGCATTTGGGTCAGGGAGTCTCCATAGAATGGAGGAAAGGGAGATATAGCACACAAGTAGACCCTGACCTAGCAGACCAACTAATTCATCTGTATTACTTTGACTGTTTTTCAGAATCTGCT --mixedSubstitutions T --mutagenesisType codon --cutadapt5First ATTTCAGGTGTCGTGAGCGGCCGC
This will allow partial (and error-tolerant) matches of the constant region../DiMSum_Project/tmp/3_tally/A3G1203preselection_e1_s0_bNA_t1.vsearch.unique ./DiMSum_Project/tmp/3_tally/A3G1203postselection_e1_s1_b1_t1.vsearch.unique
Lastly it's always a good idea to directly inspect a fragment of the raw FASTQ files (e.g. in a text editor) before running DiMSum to check that the sequences indeed match your expectation...
Let me know if you're still having issues after troubleshooting these things!
Hi @andrefaure I really appreciate all of the help, now I'm running into this issue: There were problems while running 'dimsum__cutadapt_report' Error in dimsum_stage_cutadapt(dimsum_meta = pipeline[["1_split"]], cutadapt_outpath = file.path(pipeline[["1_split"]][["tmp_path"]], : object 'dimsum_meta_new_report' not found Calls: dimsum -> dimsum_stage_cutadapt Execution halted (dimsum) clangley@gizmok45:~/dimsum$
Hi @carolinelangley are you running DiMSum on Linux or Mac? Please also send me the result when running the following on the command line (with dimsum conda environment activated):
conda list r-cairo
Hi @andrefaure My computer is a Mac but I have been using a remote server to run the software. (dimsum) clangley@gizmok45:~/dimsum$ conda list r-cairo
#
r-cairo 1.6_0 r41h06615bd_0 conda-forge
Ok it seems that r-cairo 1.6 breaks pandoc on linux. You will need to downgrade to 1.5 as follows (again make sure the dimsum conda environment is activated):
conda install -c conda-forge r-cairo=1.5
Then try running DiMSum again (no need to reinstall) and let me know if that fixes it!
@andrefaure Hmm I am still getting the same error as above.
Ok could you please run the full demo to double-check that you have a working installation? https://github.com/lehner-lab/DiMSum/blob/master/docs/DEMO.md#full-dimsum-demo-wrapsteam
If that works fine then the best would be if you could share your experiment design file and (a small fragment e.g. 100 reads of each of) the FASTQs you're using so that I can try to reproduce the error. If the FASTQ fragments are too large to attach here then you could try google drive/ dropbox/ ftp or some other file sharing service.
I'm having the same error with the demo, I'm going to uninstall and reinstall the software and see if that helps.
Yes the easiest is to simply delete the dimsum environment:
conda env remove --name dimsum
and then reinstall from scratch as before.
If the demo still doesn't work try downgrading r-cairo to version 1.5 as explained above (and double-check this worked by running conda list r-cairo
).
(Finally, whenever running the DiMSum demo or on your own data, make sure you activate the dimsum conda environment first.)
I am still getting the error while running the Demo even after reinstallation.
Do you have r-cairo 1.5 or 1.6?
1.5
One last thing - please also try downgrading pandoc to 2.16 as follows:
conda install -c conda-forge pandoc=2.16
The error is still persisting.
It must be a conflict between some of the dependencies - could you please send the full output of the following command:
conda list
and I will try to figure this out when I get the chance.
Thanks and sorry for the trouble!
@carolinelangley it looks like you have no packages installed in the 'dimsum' conda environment... You must have installed them in the default ('base') env by mistake. If you do which DiMSum
it will tell you where DiMSum is installed. Anyway please deactivate the dimsum env and send me the output of conda list as follows:
deactivate dimsum
conda list
@andrefaure clangley@gizmok1:~/dimsum$ conda list
#
_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
_r-mutex 1.0.1 anacondar_1 conda-forge
alsa-lib 1.2.3 h516909a_0 conda-forge
binutils_impl_linux-64 2.35.1 h27ae35d_9
binutils_linux-64 2.35 h67ddf6f_30 conda-forge
bioconductor-biobase 2.52.0 r41hd029910_0 bioconda
bioconductor-biocgenerics 0.38.0 r41hdfd78af_0 bioconda
bioconductor-biocparallel 1.26.0 r41h399db7b_0 bioconda
bioconductor-biostrings 2.60.0 r41hd029910_0 bioconda
bioconductor-delayedarray 0.18.0 r41hd029910_0 bioconda
bioconductor-genomeinfodb 1.28.0 r41hdfd78af_0 bioconda
bioconductor-genomeinfodbdata 1.2.6 r41hdfd78af_0 bioconda
bioconductor-genomicalignments 1.28.0 r41hd029910_0 bioconda
bioconductor-genomicranges 1.44.0 r41hd029910_0 bioconda
bioconductor-iranges 2.26.0 r41hd029910_0 bioconda
bioconductor-matrixgenerics 1.4.0 r41hdfd78af_0 bioconda
bioconductor-rhtslib 1.24.0 r41hd029910_0 bioconda
bioconductor-rsamtools 2.8.0 r41h399db7b_0 bioconda
bioconductor-s4vectors 0.30.0 r41hd029910_0 bioconda
bioconductor-shortread 1.50.0 r41h399db7b_0 bioconda
bioconductor-summarizedexperiment 1.22.0 r41hdfd78af_0 bioconda
bioconductor-xvector 0.32.0 r41hd029910_0 bioconda
bioconductor-zlibbioc 1.38.0 r41hd029910_0 bioconda
brotlipy 0.7.0 py39h27cfd23_1003
bwidget 1.9.14 ha770c72_1 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.17.1 h7f98852_1 conda-forge
ca-certificates 2022.6.15 ha878542_0 conda-forge
cairo 1.16.0 h6cf1ce9_1008 conda-forge
certifi 2022.6.15 py39hf3d152e_0 conda-forge
cffi 1.15.0 py39hd667e15_1
charset-normalizer 2.0.4 pyhd3eb1b0_0
colorama 0.4.4 pyhd3eb1b0_0
conda 4.13.0 py39hf3d152e_1 conda-forge
conda-content-trust 0.1.1 pyhd3eb1b0_0
conda-package-handling 1.8.1 py39h7f8727e_0
cryptography 36.0.0 py39h9ce1e76_0
curl 7.78.0 hea6ffbf_0 conda-forge
cutadapt 3.4 py39h38f01e4_1 bioconda
dnaio 0.5.1 py39h38f01e4_0 bioconda
fastqc 0.11.9 hdfd78af_1 bioconda
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 hab24e00_0 conda-forge
fontconfig 2.13.1 hba837de_1005 conda-forge
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
freetype 2.10.4 h0708190_1 conda-forge
fribidi 1.0.10 h36c2ea0_0 conda-forge
gcc_impl_linux-64 9.3.0 h70c0ae5_19 conda-forge
gcc_linux-64 9.3.0 hf25ea35_30 conda-forge
gettext 0.19.8.1 h0b5b191_1005 conda-forge
gfortran_impl_linux-64 9.3.0 hc4a2995_19 conda-forge
gfortran_linux-64 9.3.0 hdc58fab_30 conda-forge
giflib 5.2.1 h36c2ea0_2 conda-forge
graphite2 1.3.13 h58526e2_1001 conda-forge
gsl 2.6 he838d99_2 conda-forge
gxx_impl_linux-64 9.3.0 hd87eabc_19 conda-forge
gxx_linux-64 9.3.0 h3fbe746_30 conda-forge
harfbuzz 2.8.2 h83ec7ef_0 conda-forge
icu 68.1 h58526e2_0 conda-forge
idna 3.3 pyhd3eb1b0_0
isa-l 2.30.0 ha770c72_4 conda-forge
jbig 2.1 h7f98852_2003 conda-forge
jpeg 9d h36c2ea0_0 conda-forge
kernel-headers_linux-64 2.6.32 he073ed8_15 conda-forge
krb5 1.19.2 hcc1bbae_0 conda-forge
lcms2 2.12 hddcbb42_0 conda-forge
ld_impl_linux-64 2.35.1 h7274673_9
lerc 2.2.1 h9c3ff4c_0 conda-forge
libblas 3.9.0 11_linux64_openblas conda-forge
libcblas 3.9.0 11_linux64_openblas conda-forge
libcurl 7.78.0 h2574ce0_0 conda-forge
libdeflate 1.7 h7f98852_5 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libffi 3.3 he6710b0_2
libgcc-devel_linux-64 9.3.0 h7864c58_19 conda-forge
libgcc-ng 9.3.0 h5101ec6_17
libgfortran-ng 9.4.0 h69a702a_16 conda-forge
libgfortran5 9.4.0 h62347ff_16 conda-forge
libglib 2.68.3 h3e27bee_0 conda-forge
libgomp 9.3.0 h5101ec6_17
libiconv 1.16 h516909a_0 conda-forge
liblapack 3.9.0 11_linux64_openblas conda-forge
libnghttp2 1.43.0 h812cca2_0 conda-forge
libopenblas 0.3.17 pthreads_h8fe5266_1 conda-forge
libpng 1.6.37 h21135ba_2 conda-forge
libssh2 1.9.0 ha56f1ee_6 conda-forge
libstdcxx-devel_linux-64 9.3.0 hb016644_19 conda-forge
libstdcxx-ng 9.3.0 hd4cf53a_17
libtiff 4.3.0 hf544144_1 conda-forge
libuuid 2.32.1 h7f98852_1000 conda-forge
libwebp-base 1.2.0 h7f98852_2 conda-forge
libxcb 1.13 h7f98852_1003 conda-forge
libxml2 2.9.12 h72842e0_0 conda-forge
lz4-c 1.9.3 h9c3ff4c_1 conda-forge
make 4.3 hd18ef5c_1 conda-forge
ncurses 6.3 h7f8727e_2
openjdk 11.0.9.1 h5cc2fde_1 conda-forge
openssl 1.1.1o h7f8727e_0
pandoc 2.16.2 h7f98852_0 conda-forge
pango 1.48.7 hb8ff022_0 conda-forge
pbzip2 1.1.13 0 conda-forge
pcre 8.45 h9c3ff4c_0 conda-forge
pcre2 10.37 h032f7d1_0 conda-forge
perl 5.32.1 0_h7f98852_perl5 conda-forge
pigz 2.6 h27826a3_0 conda-forge
pip 21.2.4 py39h06a4308_0
pixman 0.40.0 h36c2ea0_0 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
pycosat 0.6.3 py39h27cfd23_0
pycparser 2.21 pyhd3eb1b0_0
pyopenssl 22.0.0 pyhd3eb1b0_0
pysocks 1.7.1 py39h06a4308_0
python 3.9.12 h12debd9_0
python-isal 0.11.0 py39h3811e60_0 conda-forge
python_abi 3.9 2_cp39 conda-forge
r-ade4 1.7_17 r41he454529_0 conda-forge
r-assertthat 0.2.1 r41hc72bb7e_2 conda-forge
r-backports 1.2.1 r41hcfec24a_0 conda-forge
r-base 4.1.0 hb67fd72_2 conda-forge
r-base64enc 0.1_3 r41hcfec24a_1004 conda-forge
r-bh 1.78.0_0 r41hc72bb7e_0 conda-forge
r-bitops 1.0_7 r41hcfec24a_0 conda-forge
r-brio 1.1.2 r41hcfec24a_0 conda-forge
r-cairo 1.5_12.2 r41hcfec24a_0 conda-forge
r-callr 3.7.0 r41hc72bb7e_0 conda-forge
r-cli 3.0.1 r41h03ef668_1 conda-forge
r-colorspace 2.0_2 r41hcfec24a_0 conda-forge
r-cowplot 1.1.1 r41hc72bb7e_0 conda-forge
r-crayon 1.5.1 r41hc72bb7e_0 conda-forge
r-data.table 1.14.0 r41hcfec24a_0 conda-forge
r-desc 1.4.1 r41hc72bb7e_0 conda-forge
r-diffobj 0.3.4 r41hcfec24a_0 conda-forge
r-digest 0.6.27 r41h03ef668_0 conda-forge
r-dimsum 1.2.8 r41hdfd78af_0 bioconda
r-dplyr 1.0.7 r41h03ef668_0 conda-forge
r-ellipsis 0.3.2 r41hcfec24a_0 conda-forge
r-evaluate 0.15 r41hc72bb7e_0 conda-forge
r-fansi 0.4.2 r41hcfec24a_0 conda-forge
r-farver 2.1.0 r41h03ef668_0 conda-forge
r-forcats 0.5.1 r41hc72bb7e_0 conda-forge
r-formatr 1.12 r41hc72bb7e_0 conda-forge
r-futile.logger 1.4.3 r41hc72bb7e_1003 conda-forge
r-futile.options 1.0.1 r41hc72bb7e_1002 conda-forge
r-generics 0.1.3 r41hc72bb7e_0 conda-forge
r-getopt 1.20.3 r41ha770c72_2 conda-forge
r-ggally 2.1.2 r41hc72bb7e_0 conda-forge
r-ggplot2 3.3.6 r41hc72bb7e_0 conda-forge
r-glue 1.4.2 r41hcfec24a_0 conda-forge
r-gridextra 2.3 r41hc72bb7e_1003 conda-forge
r-gtable 0.3.0 r41hc72bb7e_3 conda-forge
r-hexbin 1.28.2 r41h859d828_0 conda-forge
r-highr 0.9 r41hc72bb7e_0 conda-forge
r-hms 1.1.1 r41hc72bb7e_0 conda-forge
r-htmltools 0.5.1.1 r41h03ef668_0 conda-forge
r-hwriter 1.3.2.1 r41hc72bb7e_0 conda-forge
r-isoband 0.2.5 r41h03ef668_0 conda-forge
r-jpeg 0.1_8.1 r41hcfec24a_1 conda-forge
r-jquerylib 0.1.4 r41hc72bb7e_0 conda-forge
r-jsonlite 1.7.2 r41hcfec24a_0 conda-forge
r-knitr 1.37 r41hc72bb7e_0 conda-forge
r-labeling 0.4.2 r41hc72bb7e_1 conda-forge
r-lambda.r 1.2.4 r41hc72bb7e_1 conda-forge
r-lattice 0.20_44 r41hcfec24a_0 conda-forge
r-latticeextra 0.6_29 r41hc72bb7e_1 conda-forge
r-lifecycle 1.0.1 r41hc72bb7e_0 conda-forge
r-magrittr 2.0.1 r41hcfec24a_1 conda-forge
r-markdown 1.1 r41hcfec24a_1 conda-forge
r-mass 7.3_54 r41hcfec24a_0 conda-forge
r-matrix 1.3_4 r41he454529_0 conda-forge
r-matrixstats 0.60.0 r41hcfec24a_0 conda-forge
r-mgcv 1.8_36 r41he454529_0 conda-forge
r-mime 0.11 r41hcfec24a_0 conda-forge
r-munsell 0.5.0 r41hc72bb7e_1004 conda-forge
r-nlme 3.1_152 r41h859d828_0 conda-forge
r-optparse 1.7.1 r41hc72bb7e_0 conda-forge
r-pillar 1.7.0 r41hc72bb7e_0 conda-forge
r-pixmap 0.4_12 r41hc72bb7e_0 conda-forge
r-pkgconfig 2.0.3 r41hc72bb7e_1 conda-forge
r-pkgload 1.2.1 r41h03ef668_0 conda-forge
r-plyr 1.8.6 r41h03ef668_1 conda-forge
r-png 0.1_7 r41hcfec24a_1004 conda-forge
r-praise 1.0.0 r41hc72bb7e_1005 conda-forge
r-prettyunits 1.1.1 r41hc72bb7e_1 conda-forge
r-processx 3.5.2 r41hcfec24a_0 conda-forge
r-progress 1.2.2 r41hc72bb7e_2 conda-forge
r-ps 1.6.0 r41hcfec24a_0 conda-forge
r-purrr 0.3.4 r41hcfec24a_1 conda-forge
r-r6 2.5.1 r41hc72bb7e_0 conda-forge
r-rcolorbrewer 1.1_3 r41h785f33e_0 conda-forge
r-rcpp 1.0.7 r41h03ef668_0 conda-forge
r-rcurl 1.98_1.3 r41hcfec24a_0 conda-forge
r-rematch2 2.1.2 r41hc72bb7e_1 conda-forge
r-reshape 0.8.9 r41hc72bb7e_0 conda-forge
r-reshape2 1.4.4 r41h03ef668_1 conda-forge
r-rlang 0.4.11 r41hcfec24a_0 conda-forge
r-rmarkdown 2.11 r41hc72bb7e_1 conda-forge
r-rprojroot 2.0.3 r41hc72bb7e_0 conda-forge
r-rstudioapi 0.13 r41hc72bb7e_0 conda-forge
r-scales 1.2.0 r41hc72bb7e_0 conda-forge
r-segmented 1.5_0 r41hc72bb7e_0 conda-forge
r-seqinr 4.2_8 r41hcfec24a_0 conda-forge
r-snow 0.4_4 r41hc72bb7e_0 conda-forge
r-sp 1.4_5 r41hcfec24a_0 conda-forge
r-stringi 1.7.3 r41hcabe038_0 conda-forge
r-stringr 1.4.0 r41hc72bb7e_2 conda-forge
r-testthat 3.0.4 r41h03ef668_0 conda-forge
r-tibble 3.1.3 r41hcfec24a_0 conda-forge
r-tidyr 1.1.3 r41h03ef668_0 conda-forge
r-tidyselect 1.1.2 r41hc72bb7e_0 conda-forge
r-tinytex 0.40 r41hc72bb7e_0 conda-forge
r-utf8 1.2.2 r41hcfec24a_0 conda-forge
r-vctrs 0.3.8 r41hcfec24a_1 conda-forge
r-viridislite 0.4.0 r41hc72bb7e_0 conda-forge
r-waldo 0.3.1 r41hc72bb7e_0 conda-forge
r-withr 2.5.0 r41hc72bb7e_0 conda-forge
r-xfun 0.24 r41h03ef668_0 conda-forge
r-yaml 2.2.1 r41hcfec24a_1 conda-forge
readline 8.1.2 h7f8727e_1
requests 2.27.1 pyhd3eb1b0_0
ruamel_yaml 0.15.100 py39h27cfd23_0
sed 4.8 he412f7d_0 conda-forge
setuptools 61.2.0 py39h06a4308_0
six 1.16.0 pyhd3eb1b0_1
sqlite 3.38.2 hc218d9a_0
starcode 1.4 h779adbc_1 bioconda
sysroot_linux-64 2.12 he073ed8_15 conda-forge
tk 8.6.11 h1ccaba5_0
tktable 2.10 hb7b940f_3 conda-forge
tqdm 4.63.0 pyhd3eb1b0_0
tzdata 2022a hda174b7_0
urllib3 1.26.8 pyhd3eb1b0_0
vsearch 2.17.1 h95f258a_0 bioconda
wheel 0.37.1 pyhd3eb1b0_0
xopen 1.5.0 py39hf3d152e_0 conda-forge
xorg-fixesproto 5.0 h7f98852_1002 conda-forge
xorg-inputproto 2.3.2 h7f98852_1002 conda-forge
xorg-kbproto 1.0.7 h7f98852_1002 conda-forge
xorg-libice 1.0.10 h7f98852_0 conda-forge
xorg-libsm 1.2.3 hd9c2040_1000 conda-forge
xorg-libx11 1.7.2 h7f98852_0 conda-forge
xorg-libxau 1.0.9 h7f98852_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xorg-libxext 1.3.4 h7f98852_1 conda-forge
xorg-libxfixes 5.0.3 h7f98852_1004 conda-forge
xorg-libxi 1.7.10 h7f98852_0 conda-forge
xorg-libxrender 0.9.10 h7f98852_1003 conda-forge
xorg-libxt 1.2.1 h7f98852_2 conda-forge
xorg-libxtst 1.2.3 h7f98852_1002 conda-forge
xorg-recordproto 1.14.2 h7f98852_1002 conda-forge
xorg-renderproto 0.11.1 h7f98852_1002 conda-forge
xorg-xextproto 7.3.0 h7f98852_1002 conda-forge
xorg-xproto 7.0.31 h7f98852_1007 conda-forge
xz 5.2.5 h7b6447c_0
yaml 0.2.5 h7b6447c_0
zlib 1.2.12 h7f8727e_1
zstd 1.5.0 ha95c52a_0 conda-forge
clangley@gizmok1:~/dimsum$
Ok I finally got the demo working, you were completely correct. Thanks for bearing with me. Now I continue to run into the issue I originally posted about about the WT variant not being found.
Of the region sequenced, the nucleotides encoding the first 63 base pairs are constant, and then after that each position is mutated to encode every possible amino acid. The 3' does not have any constant region, as the primers I used to add Illumina UMIs immediately followed a mutagenized codon.
Great! I suggest doing a simple 'grep' for your WT variant in your raw FASTQ files to reassure yourself that the design is as you expect. If you do find it, could you paste a few lines of the result here to verify (ideally full FASTQ record = 4lines with qualities)?
You can also try manually searching for the WT variant in the following temp files (from stage 3):
./DiMSum_Project/tmp/3_tally/A3G1203preselection_e1_s0_bNA_t1.vsearch.unique
./DiMSum_Project/tmp/3_tally/A3G1203postselection_e1_s1_b1_t1.vsearch.unique
If it is indeed not in these files, the WT variant is either not present in the original FASTQ files or has been filtered out by one of the previous stages e.g. no constant region match found, poor base quality, no alignment etc.
Hmm I see my WT sequence represented and I also added a --cutadapt5Second argument. All of my reads are getting trimmed, but less than 50% of them are aligning as they are getting thrown out in Stage 2.
fastq.zip This is the exact command I've been running and this is some example data. I'm really not sure I've been trying different parameters in the arguments and still get the same error. experimentaldesign2.txt
DiMSum --fastqFileDir fastq --experimentDesignPath experimentaldesign2.txt --wildtypeSequence GTGGACAGGATGAAGATTAAAACATGGAAAAGTTTAGTAAAGCATCATATGTATGTTTCAAAGAAGGCTAGGAGATGGTTTTATAGACATCACTATGAAAGCACTCATCCAAAAATAAGTTCAGAAGTACACATCCCACTAGAGAAGGGTGAATTGGTAGTAACAACATATTGGGGTCTGCATACAGGAGAAAGAGACTGGCATTTGGGTCAGGGAGTCTCCATAGAATGGAGGAAAGGGAGATATAGCACACAAGTAGACCCTGACCTAGCAGACCAACTAATTCATCTGTATTACTTTGACTGTTTTTCA --mixedSubstitutions T --mutagenesisType codon --cutadapt5First='ATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAA;required...GACTCTGCT;optional' --vsearchMaxee 0.8 --cutadaptErrorRate 0.8
The following works for me i.e. the WT is amongst the most abundant in the filtered variants. You need to include the 3' constant region=GACTCTGCT as optional in the first read (it is normally not sequenced for variants matching the WT length=312) and required for the second read in the pair (present in 96% of reads in the FASTQs you shared):
DiMSum --fastqFileDir fastq --experimentDesignPath experimentaldesign2.txt --wildtypeSequence GTGGACAGGATGAAGATTAAAACATGGAAAAGTTTAGTAAAGCATCATATGTATGTTTCAAAGAAGGCTAGGAGATGGTTTTATAGACATCACTATGAAAGCACTCATCCAAAAATAAGTTCAGAAGTACACATCCCACTAGAGAAGGGTGAATTGGTAGTAACAACATATTGGGGTCTGCATACAGGAGAAAGAGACTGGCATTTGGGTCAGGGAGTCTCCATAGAATGGAGGAAAGGGAGATATAGCACACAAGTAGACCCTGACCTAGCAGACCAACTAATTCATCTGTATTACTTTGACTGTTTTTCA --mixedSubstitutions T --mutagenesisType codon --cutadapt5First='ATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAA;required...GACTCTGCT;optional' --cutadapt5Second='AGCAGAGTC;required...TTGCCACACAATCATCACCTGCCATCTGTTTTCCAT;optional' --retainIntermediateFiles T
By the way, setting the cutadapt error rate ('cutadaptErrorRate') to 80% is not advisable because you will get spurious matches.
I'm closing this issue now that you have a working installation of DiMSum.
Hello, I am trying to process some reads that do not have any barcodes. The command I am running is:
"DiMSum --fastqFileDir fastq --experimentDesignPath experimentaldesign.txt --wildtypeSequence atttcaggtgtcgtgagcggccgcATGGAAAACAGATGGCAGGTGATGATTGTGTGGCAAGTGGACAGGATGAAGATTAAAACATGGAAAAGTTTAGTAAAGCATCATATGTATGTTTCAAAGAAGGCTAGGAGATGGTTTTATAGACATCACTATGAAAGCACTCATCCAAAAATAAGTTCAGAAGTACACATCCCACTAGAGAAGGGTGAATTGGTAGTAACAACATATTGGGGTCTGCATACAGGAGAAAGAGACTGGCATTTGGGTCAGGGAGTCTCCATAGAATGGAGGAAAGGGAGATATAGCACACAAGTAGACCCTGACCTAGCAGACCAACTAATTCATCTGTATTACTTTGACTGTTTTTCAGAATCTGCT --mixedSubstitutions T --mutagenesisType codon"
Every time, I keep getting this error:
DiMSum STAGE 4 (STEAM): PROCESS VARIANT SEQUENCES
Loading variant count files: ./DiMSum_Project/tmp/3_tally/A3G1203preselection_e1_s0_bNA_t1.vsearch.unique ./DiMSum_Project/tmp/3_tally/A3G1203postselection_e1_s1_b1_t1.vsearch.unique Processing... A3G1203preselection_e1_s0_bNA_t1.vsearch.unique A3G1203postselection_e1_s1_b1_t1.vsearch.unique Processing merged variants... WT variant not found. Did you mean to specify one of the following?
caught segfault address (nil), cause 'unknown'
Traceback: 1: is.sorted(jval, by = key(x)) 2:
[.data.table
(variant_dt[all_reads == T, ], order(mean_count, decreasing = T)[1:5], .(nt_seq = toupper(nt_seq), all_reads, mean_count)) 3: variant_dt[all_reads == T, ][order(mean_count, decreasing = T)[1:5], .(nt_seq = toupper(nt_seq), all_reads, mean_count)] 4: print(variant_dt[all_reads == T, ][order(mean_count, decreasing = T)[1:5], .(nt_seq = toupper(nt_seq), all_reads, mean_count)]) 5: dimsum__process_merged_variants(dimsum_meta = dimsum_meta, input_dt = variant_data_merge) 6: dimsum_stage_merge(dimsum_meta = pipeline[["3_tally"]], merge_outpath = pipeline[["3_tally"]][["project_path"]], report_outpath = file.path(pipeline[["3_tally"]][["project_path"]], "reports")) 7: dimsum(runDemo = arg_list[["runDemo"]], fastqFileDir = arg_list[["fastqFileDir"]], fastqFileExtension = arg_list[["fastqFileExtension"]], gzipped = arg_list[["gzipped"]], stranded = arg_list[["stranded"]], paired = arg_list[["paired"]], barcodeDesignPath = arg_list[["barcodeDesignPath"]], barcodeErrorRate = arg_list[["barcodeErrorRate"]], experimentDesignPath = arg_list[["experimentDesignPath"]], experimentDesignPairDuplicates = arg_list[["experimentDesignPairDuplicates"]], barcodeIdentityPath = arg_list[["barcodeIdentityPath"]], countPath = arg_list[["countPath"]], cutadaptCut5First = arg_list[["cutadaptCut5First"]], cutadaptCut5Second = arg_list[["cutadaptCut5Second"]], cutadaptCut3First = arg_list[["cutadaptCut3First"]], cutadaptCut3Second = arg_list[["cutadaptCut3Second"]], cutadapt5First = arg_list[["cutadapt5First"]], cutadapt5Second = arg_list[["cutadapt5Second"]], cutadapt3First = arg_list[["cutadapt3First"]], cutadapt3Second = arg_list[["cutadapt3Second"]], cutadaptMinLength = arg_list[["cutadaptMinLength"]], cutadaptErrorRate = arg_list[["cutadaptErrorRate"]], cutadaptOverlap = arg_list[["cutadaptOverlap"]], vsearchMinQual = arg_list[["vsearchMinQual"]], vsearchMaxee = arg_list[["vsearchMaxee"]], vsearchMinovlen = arg_list[["vsearchMinovlen"]], outputPath = arg_list[["outputPath"]], projectName = arg_list[["projectName"]], wildtypeSequence = arg_list[["wildtypeSequence"]], permittedSequences = arg_list[["permittedSequences"]], reverseComplement = arg_list[["reverseComplement"]], sequenceType = arg_list[["sequenceType"]], mutagenesisType = arg_list[["mutagenesisType"]], transLibrary = arg_list[["transLibrary"]], transLibraryReverseComplement = arg_list[["transLibraryReverseComplement"]], bayesianDoubleFitness = arg_list[["bayesianDoubleFitness"]], bayesianDoubleFitnessLamD = arg_list[["bayesianDoubleFitnessLamD"]], fitnessMinInputCountAll = arg_list[["fitnessMinInputCountAll"]], fitnessMinInputCountAny = arg_list[["fitnessMinInputCountAny"]], fitnessMinOutputCountAll = arg_list[["fitnessMinOutputCountAll"]], fitnessMinOutputCountAny = arg_list[["fitnessMinOutputCountAny"]], fitnessHighConfidenceCount = arg_list[["fitnessHighConfidenceCount"]], fitnessDoubleHighConfidenceCount = arg_list[["fitnessDoubleHighConfidenceCount"]], fitnessNormalise = arg_list[["fitnessNormalise"]], fitnessErrorModel = arg_list[["fitnessErrorModel"]], indels = arg_list[["indels"]], maxSubstitutions = arg_list[["maxSubstitutions"]], mixedSubstitutions = arg_list[["mixedSubstitutions"]], retainIntermediateFiles = arg_list[["retainIntermediateFiles"]], splitChunkSize = arg_list[["splitChunkSize"]], retainedReplicates = arg_list[["retainedReplicates"]], startStage = arg_list[["startStage"]], stopStage = arg_list[["stopStage"]], numCores = arg_list[["numCores"]]) An irrecoverable exception occurred. R is aborting now ... Segmentation fault (core dumped)Even if I fiddle around with the max mutations allowed, I still get this error. Does anyone know how to resolve this??