Closed Dazcam closed 3 years ago
Hi, MAESTRO uses STARsolo for scRNAseq quantification. You can add --soloFeatures GeneFull for single-nuclei data after you initiate the Snakefile manually at https://github.com/liulab-dfci/MAESTRO/blob/master/MAESTRO/Snakemake/scRNA/Snakefile#L48
In the future, we should expose that as a parameter in the config.yaml file.
Thanks!
Many thanks for responding. I will add that command to the Snakefile today and see if it runs to completion. The pipeline hit the skids after the scrna_rseqc_genecov
rule. Although that rule completed without error the logs reported the following warning:
Cannot get coverage signal from 14510_PFC_RNAAligned.sortedByCoord.out.sample.bam ! Skip
Sample Skewness
@ 2021-01-09 00:14:17: Running R script ...
Likely a mismatch between the BED and BAM files. This caused the pipeline to choke during the scrna_rseqc_plot
rule as the RNAGenebodyCoveragePlot
could not be generated.
Error in `[.data.frame`(gene_cov, , 2) : undefined columns selected
Calls: RNAGenebodyCoveragePlot -> [ -> [.data.frame
I also had a buffer size issue. I assume this is due to my samples being sequenced extremely deeply?
EXITING because of fatal error: buffer size for SJ output is too small
Solution: increase input parameter --limitOutSJcollapsed
I managed to solve it by adding the following line in shell command of the scrna_map
rule.
--limitOutSJcollapsed 5000000
Source here. May be worth adding this somewhere in config or docs?
Are you planning on adding ssclusteval to the pipeline?
UPDATE: 13th Jan 2021
When running with the --soloFeatures GeneFull
parameter the directory names of some of the output files are changed such that they do not match what is specified in the Snakefile.
Instead of: Result/STAR/%sSolo.out/Gene/raw/matrix.mtx
They are stored in Result/STAR/%sSolo.out/GeneFull/raw/matrix.mtx
I think this only affects the scrna-map
and scrna_qc
rules.
Error message:
MissingOutputException in line 21 of /scratch/c.c1477909/maestro_analysis/14510_PFC_RNAv2/Snakefile:
Job Missing files after 5 seconds:
Result/STAR/14510_PFC_RNASolo.out/Gene/raw/matrix.mtx
Result/STAR/14510_PFC_RNASolo.out/Gene/raw/features.tsv
Result/STAR/14510_PFC_RNASolo.out/Gene/raw/barcodes.tsv
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Job id: 0 completed successfully, but some output files are missing. 0
Removing output files of failed job scrna_map since they might be corrupted:
Result/STAR/14510_PFC_RNAAligned.sortedByCoord.out.bam, Result/STAR/14510_PFC_RNAAligned.sortedByCoord.out.bam.bai
Shutting down, this might take some time.
I have modified the Snakefile and now running MAESTRO again.
Thanks for reporting, we will keep this in our mind and make it in our next release!
Hi, we just made a new release MAESTRO1.5.1 which supports single-nuclei data. Can you please give it a try? Thanks!
Thanks for the update. Unfortunately I had to abandon using Maestro due to the issues I was having around the time I posted. I now have a well developed pipeline of my own for my single-nuclei data but will keep my eye on Maestro's development and may consider using in the future.
Thanks for the feedback!
I got the same error: EXITING because of fatal error: buffer size for SJ output is too small Solution: increase input parameter --limitOutSJcollapsed When running the newest version 1.5.4 (only available on the macs3 fork) to run the multiome pipeline. I have yet to try the solution previously proposed. Will let you know.
Hello,
I'm currently installing the MAESTRO prerequisites and, after reading the paper, I'd like to ask if MAESTRO is compatible with 10X data derived from nuclear RNA, particularly if I'm looking to integrate single-modal snRNA- and snATAC-seq data?
And more specifically, could the use of a pre-mRNA reference and GTF files for alignment, as opposed to standard reference/annotation files, impact a MAESTRO analysis at all?
Until now I have been using Cell Ranger 4 for my analysis which recommends using a pre-mRNA reference and GTF file for nuclear RNA. I had started creating STARsolo compatible versions of these files for my MAESTRO analysis and wondered if this is the best course of action, particularly as 10X have recently released v5 which includes a new function for dealing with intronic reads without the need of a pre-RNA reference, and STARsolo also provides a similar function.
Regardless, it would be useful to hear if you have any recommendations or points of interest that I should consider when running MAESTRO using single-nuclear data.
Many Thanks,
Darren