wenweixiong / MARVEL

38 stars 9 forks source link

replicating the example for data from droplet. #16

Open JFerrer-B opened 1 year ago

JFerrer-B commented 1 year ago

Hi, I am trying to replicate the results you have obtained and I run into a versioning problem.

On the one hand, you use cellranger version 2.1.1. This version by default uses a version of STAR 2.5.1b. But on the other hand, to align the BAM files resulting from cellranger count you use STAR version 2.7.8a.

When doing this I got the following error: "EXITING because of FATAL ERROR: Genome version: 2.5.1 is INCOMPATIBLE with running STAR version: 2.7.8a"

I hope you can tell me how to solve this. Juan

wenweixiong commented 1 year ago

I think the issue doesn't lie with cell ranger using a different version of STAR. I think it's because the reference genome (.fa) was indexed with a different version of STAR, i.e., a version other than 2.7.8a. Solution: You may need to index the genome with STAR version 2.7.8a and then re-run cell ranger and STAR 2.7.8a.

JFerrer-B commented 1 year ago

To apply cellranger count you before applied cellragner mkref?

JFerrer-B commented 11 months ago

In the tutorial that you show, you say that you use the cellranger version v2.1.1 the version of STAR 2.7.8a. With this version of cellranger you can create a reference genome that is not compatible with that STAR version. As you show in the tutorial, you use the same reference for both cellranger and STAR. How did you build this reference? With the previous message it is not clear to me what you did to be able to replicate it.

xujialiu commented 7 months ago

In the tutorial that you show, you say that you use the cellranger version v2.1.1 the version of STAR 2.7.8a. With this version of cellranger you can create a reference genome that is not compatible with that STAR version. As you show in the tutorial, you use the same reference for both cellranger and STAR. How did you build this reference? With the previous message it is not clear to me what you did to be able to replicate it.

you are right, the version of reference file does not match, therefore, you should using the STARtool to build a new reference file by using the following code:

STAR \
--runMode genomeGenerate \
--runThreadN 20 \
--genomeDir ./starsolo_index \
--genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa  \
--sjdbGTFfile Homo_sapiens.GRCh38.110.gtf

The Homo_sapiens.GRCh38.110.gtf and Homo_sapiens.GRCh38.dna.primary_assembly.fa can be found at following website: https://www.ensembl.org/Homo_sapiens/Info/Index

Hope my answer can help you.

Manoswini-02 commented 4 months ago

The reference file used for generating bam file in Cell ranger is indexed. Do I need to do indexing with STAR and repeat? I am curious to know why STARsolo is used if the output of Cell ranger can be directly used as input for SingleCellaR for normalization?

wenweixiong commented 4 months ago

The output of cellranger is used by SingleCellaR to generate the normalised gene expression matrix. This matrix is used as input for MARVEL for downstream analysis (please see "Normalised gene expression" section under the "Input files" section of the tutorial: https://wenweixiong.github.io/MARVEL_Droplet.html)

Ideally the reference genome file should be indexed with STAR so that it can be used by STARsolo.

STARsolo is used to obtain the raw gene count matrix and splice junction count matrix. These two matrices are used as inputs for MARVEL for computing percent spliced-in (PSI) values and other analyses downstream (please see "Gene counts" and "Splice junction counts" sections under the "Input files" section of the tutorial: https://wenweixiong.github.io/MARVEL_Droplet.html).