limey-bean / Anacapa

Written by Emily Curd (eecurd@g.ucla.edu), Jesse Gomer (jessegomer@gmail.com), Gaurav Kandlikar (gkandlikar@ucla.edu), Zack Gold (zjgold@ucla.edu), Max Ogden (max@maxogden.com), Lenore Pipes (lpipes@berkeley.edu)and Baochen Shi (biosbc@gmail.com). Assistance was provided by Rachel Meyer (rsmeyer@ucla.edu).
MIT License
40 stars 19 forks source link

Problem with the dada2 part of the pipeline #53

Open ucbtmae opened 4 years ago

ucbtmae commented 4 years ago

Hello!

I am trying to learn and run the Anacapa pipeline for my PhD project. I am using macOS Catalina with the Miniconda environment (Python 2.7) with both Conda and Homebrew installed. I also have R and RStudio already installed too.

At the moment I have managed to be able to install all the dependencies (such as fasxt toolkit, dada2, and others) run the script (the Anacapa QC, dada2 and BLCA) until the dada2 part of the pipeline. So I am using your examples and 12S example data to learn how to use it and how to manage the outputs. I manage to sort some issues regarding macOS (such as updating the Bash GCC compiler to be able to recognise the &>> code of the shell). But apparently there is a problem when the pipeline gets to the dada2 part. Here is the Terminal code:

**Running in local mode Using User Defined Primers Required Arguments Given

Sun 26 Apr 2020 02:21:02 BST

Preprocessing: 1) Generate an md5sum file Sun 26 Apr 2020 02:21:02 BST Preprocessing: 2) Change file suffixes Sun 26 Apr 2020 02:21:02 BST Preprocessing: 3) Uncompress files Sun 26 Apr 2020 02:21:02 BST QC: 1) Run cutadapt to remove 5'sequncing adapters and 3'primers + sequencing adapters, sort for length, and quality.

Generating Primer and Primer + Adapter files for cutadapt steps. Your adapter type is nextera.

first1000reads-LSC-A-1-S19-L001 ... forward... check reverse... check Sun 26 Apr 2020 02:21:03 BST

first1000reads-LSC-A-2-S20-L001 ... forward... check reverse... check Sun 26 Apr 2020 02:21:03 BST 12S

Checking that Paired reads are still paired:

12S ... 12S_first1000reads-LSC-A-1-S19-L001 ...check! 12S_first1000reads-LSC-A-2-S20-L001 ...check! Sun 26 Apr 2020 02:21:04 BST

Process metabarcode reads for with dada2

12S Running Dada2 inline

Running dada2 on paired reads 0 moving on Sun 26 Apr 2020 02:22:00 BST

Running dada2 on forward reads 0 moving on Sun 26 Apr 2020 02:22:54 BST

Running dada2 on reverse reads 0 moving on Sun 26 Apr 2020 02:23:48 BST

If a dada2 job fails you can find the run script...**

When I go to check the dada2 output it says this:

**Downloading GitHub repo benjjneb/dada2@v1.6

checking for file ‘/private/var/folders/f9/rgzykm7d5439h5_mpc1h3lw80000gn/T/Rtmp91tC20/remotesac236df086cb/benjjneb-dada2-553008d/DESCRIPTION’ ...

✔ checking for file ‘/private/var/folders/f9/rgzykm7d5439h5_mpc1h3lw80000gn/T/Rtmp91tC20/remotesac236df086cb/benjjneb-dada2-553008d/DESCRIPTION’

─ preparing ‘dada2’:

checking DESCRIPTION meta-information ...

✔ checking DESCRIPTION meta-information

─ cleaning src

─ checking for LF line-endings in source and make files and shell scripts

─ checking for empty or unneeded directories

─ building ‘dada2_1.6.0.tar.gz’

I have made sure that dada2 is installed (along with the Bioconductor package manager for R) and being able to load it when using RStudio, but for some reason, the Anacapa script is trying to download the dada2 1.6 version for GitHub instead of working with the installed version I have. And apparently, when trying to download the dada2 1.6 version with RStudio and R I cannot do it, as it displays the same error messages as the dada2 Anacapa output file.

I am no expert on meddling with code and shells, but do you think this is still something I have to polish on my part or maybe the Anacapa pipeline is trying to use the dada2 1.6 version instead of the one I have installed (which is the latest versions from Bioconductor). Because the 1.6 version is not available anymore from GitHub or other repos, maybe this is the reason why I cannot get the pipeline running.

Do you know what can I do to address this issue, as I can't still test the other part of the Anacapa pipeline?

Thanks a lot and have a great start of the week!

FabianRoger commented 3 years ago

Hi,

I have absolutely no experience with Anacacapa as I just started to try it out. But looking through the scripts, I am fairly confident that you error comes from the dada2_unified_script.R, specifically line 64-72

.dada_version = "1.6.0"
.dada_version_gh = "v1.6"
if("dada2" %in% installed.packages()){
  if(packageVersion("dada2") == .dada_version) {
    cat("congrats, right version of dada2")
  } else {
    devtools::install_github("benjjneb/dada2", ref=.dada_version_gh)
  }
}

What happens here ist that the script checks if DADA2 is installed and if it is the right version. If you don't have it installed or as in your case you have a different version installed, it will try to download the version it wants and compile it from source - which fails in your case.

I see two options

option 1 find out how to install dada2 version 1.6.0

maybe you can find some help here

option 2

find that script (Anacapa_db/scripts/dada2_unified_script.R) open it in Rstudio

comment out the code checking for the right dada2 version (line 64 to 83), save the file and try again

my guess is that it should work but if anything changed in the output from dada2 you might get some obscure error down the line. Option 1 is definitely to be preferred.

zjgold commented 3 years ago

Hi @ucbtmae,

Yes the issue that Anacapa, as it is currently configured, only works with dada2 version 1.6.0 . Given that the packages continue to be updated and changed, we created a signularity container that should have all of the dependencies and the correct versions in place. This way you should be able to just download the singularity container and then get it up and running on your computer or local server. It is a little trickier to get set up on a HPC just because every HPC is configured differently and will likely require you to work with the administrator/manager of the system to make sure it can run the container and is configured correctly (ran into this 6 months ago).

2 options: ensure you have the correct versions of each of the dependencies manually or install the toolkit using the container.

Hope that helped!

FabianRoger commented 3 years ago

edit it worked in the end.

I am trying to install the container in a virtual box on a mac (Big Sour) but wasn't successful

see https://github.com/datproject/anacapa-container/issues/6