sdparekh / zUMIs

zUMIs: A fast and flexible pipeline to process RNA sequencing data with UMIs
GNU General Public License v3.0
269 stars 67 forks source link

Error with yaml file #337

Closed MartaBenegas closed 1 year ago

MartaBenegas commented 1 year ago

Hi team!

I've just generated a yaml file using the shiny app but it raises an error:

root@f810632254f2:/usr/bin/zUMIs-2.9.7# bash zUMIs.sh -y /data/yaml
zUMIs.sh: line 7: curl: command not found
------------- 

 Good news! A newer version of zUMIs is available at https://github.com/sdparekh/zUMIs 

-------------
YAML file has an error. Look at the zUMIs_YAMLerror.log or contact developers.

The error log file: image

It's all in NULL, but the files does exist in the path specified in the yaml file: image

The yaml file (I've just add .txt at the end so I can upload it on github): yaml.txt

Desktop (please complete the following information):

root@f810632254f2:/data# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:    20.04
Codename:   focal

I'm inside a docker image.

cziegenhain commented 1 year ago

Hi,

Please set the barcode file path to ~, that should take care of this error.

Other comments on your YAML file, I am assuming this looks like 10x-ish data

Best, C

MartaBenegas commented 1 year ago

Dear @cziegenhain,

Sorry to bother again but I haven't been able to run zUMIs yet. I fixed that issue with your recommendations and the analysis was running, but it failed in the end.

My Dataset The dataset I'm using is this one from 10x for training purposes. In case you don't have access, here you have a screenshot with the summary: image

It consists of three files:

And I've downloaded the shared_i7_barcodes.txt from here. I've downloaded the file in Single Index Plates > Chromium Single Cell v3. I reformated it to make it in tabular format.

zUMIs Analysis I've run zUMIs with the following yaml and command: $ bash zUMIs.sh -y /data/10x.yaml > zumis.log 10x.yaml.txt

Here you have the zumis.log with the standard output: zumis.log

However, the standard error was printed in the console. I don't know in which order those messages appeared. Sorry for the confusion, I thought that everything was going to be redirected to the file:

Warning message:
Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
i Please use `linewidth` instead. 
Warning message:
In BCbin(bccount_file = paste0(opt$out_dir, "/", opt$project, ".BCstats.txt"),  :
  NAs introduced by coercion
Warning messages:
1: In fread(gtf, sep = "\t", header = F) :
  Stopped early on line 209002. Expected 9 fields but found 1. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<##sequence-region NT_166280.1 1 169725>>
2: In parallel::mclapply(gtf_info, function(x) { :
  all scheduled cores encountered errors in user cod

The output generated so far:

[ec2-user@ip-172-31-19-19 output]$ ll
total 19558032
-rw-r--r-- 1 root root   90619440 nov 15 13:49 10x.BCstats.txt
-rw-r--r-- 1 root root 4165088437 nov 15 13:57 10x.filtered.tagged.Aligned.out.bam
-rw-r--r-- 1 root root 4300882557 nov 15 13:58 10x.filtered.tagged.Aligned.out.bam.ex.featureCounts.bam.tmp
-rw-r--r-- 1 root root 4404796914 nov 15 14:44 10x.filtered.tagged.Aligned.out.bam.ex.featureCounts.bam.tmp.featureCounts.bam
-rw-r--r-- 1 root root 3435290589 nov 15 13:57 10x.filtered.tagged.Aligned.toTranscriptome.out.bam
-rw-r--r-- 1 root root       4053 nov 15 13:57 10x.filtered.tagged.Log.final.out
-rw-r--r-- 1 root root 2799163082 nov 15 13:49 10x.filtered.tagged.unmapped.bam
-rw-r--r-- 1 root root  831535963 nov 15 13:49 10x.final_annot.gtf
-rw-r--r-- 1 root root        277 nov 15 13:45 10x.zUMIs_runlog.txt
drwxr-xr-x 6 root root       4096 nov 15 13:49 zUMIs_output
[ec2-user@ip-172-31-19-19 output]$ ll zUMIs_output/
total 292
-rw-r--r-- 1 root root     25 nov 15 13:49 10x.BCbinning.txt
-rw-r--r-- 1 root root 141914 nov 15 13:49 10xkept_barcodes_binned.txt
-rw-r--r-- 1 root root 141914 nov 15 13:49 10xkept_barcodes.txt
drwxr-xr-x 2 root root   4096 nov 15 13:45 expression
drwxr-xr-x 2 root root   4096 nov 15 13:49 stats
[ec2-user@ip-172-31-19-19 output]$ ll zUMIs_output/stats/
total 36
-rw-r--r-- 1 root root 34918 nov 15 13:49 10x.detected_cells.pdf
[ec2-user@ip-172-31-19-19 output]$ ll zUMIs_output/expression/
total 0

Let me know if you'd like me to send any of the files.

Any help would be appreciated! I'm learning how to run zUMIs and I don't know if I specified the parameters correctly or the error is due to something else. I'd like to know if I'm doing things right before starting the AWS machine again.

Thanks!

cziegenhain commented 1 year ago

Hi,

Sounds like the shared i7 merging step is throwing the error. Could be related to the formatting of the file you give there. Since this is a single 10x sample you do not need to feed the neuron_1k_v3_S1_L001_I1_001.fastq.gz fastq file.

Best, Christoph