The question about .config file

houruiyan commented 3 years ago

Hi, thanks for the great tool. I am trying to use it to solve some problems in my project. I have the 10x data and I used the cellranger to align them into the human ref. Finally, I got the bam file. So I want to configure the .config file. But I found it seems is not friendly to the input file exception the SICILIAN. I cannot how to write the input_file and meta file. Could you please give me some examples? I cannot understand the definition of "grouping_level_1 and grouping_level_2" and could you give me some explanation? Thank you in advance!

kaitlinchaung commented 3 years ago

Hello! Thank you for your question.

It sounds like you have some cellranger-aligned bams, and you have not run SICILIAN on that bam, is that correct? In that case, I think you would want to have the following options: SICILIAN = false samplesheet = YOUR_SAMPLESHET_HERE.csv

For 10X data, I would follow the instructions in the first block to create the samplesheet: https://github.com/salzmanlab/SpliZ#samplesheets You should have 2 comma-separated columns:

the name of the bam file(translates to the bam_ID)
the path to that bam file

For the metadata, that file should have at least 3 columns:

cell_id formatted as ${bamID}${cellranger_barcode}
grouping_level_1 the metadata unit over which you would like to perform differential analysis
grouping_level_2 the metadata unit that you would like to calculate differential analysis

It is possible that you only have one group over which you'd like to perform differential analysis( #2 ), in which case, you can leave grouping_level_1 blank, and your metadata would look like:

cell_id formatted as ${bamID}${cellranger_barcode}
- grouping_level_2 the metadata unit that you would like to calculate differential analysis

An example I can provide is if you have data from multipletissue (i.e. lung, kidney, and heart) and multiple cell_type (i.e endothelial, blood, capillary) within each tissue.

If grouping_level_1 = tissue and grouping_level_2 = cell_type, then you would be looking for differential SpliZ in endothelial vs blood vs capillary FOR EACH tissue.
If grouping_level_2 = tissue and there is no grouping_level_1, then you would be looking for differential SpliZ in endothelial vs blood vs capillary, irrespective of tissue.
If grouping_level_2 = cell_type and there is no grouping_level_1, then you would be looking for differential SpliZ in lung vs kidney vs heart, irrespective of cell_type.

I hope that helps, and feel free to paste in your config file/metadata/samplesheets to check. And thanks again for your question, I'll update the readme to clarify the parameters a bit.

houruiyan commented 3 years ago

Thank you very much! Your explanation is very clear! I write the .config file and build the meta data/samplesheet according to your instruction. I think there is also point that should be paid attention. When we use the bam file, we do not need to set value for the "input file". I think it works. This is my meta data.

This is my config.

But there is another new problem appear.

I don't know the point causing this problem. Hope to get your help. Thank you!

kaitlinchaung commented 3 years ago

Can you please navigate to the 'Work dir' of that failed job, and paste the results of *.log?

The 'Work dir' path is located in the bottom of your second image, i.e./storage/yhhuang/../work/..

kaitlinchaung commented 3 years ago

It may also be helpful to paste in a couple lines of your MS_ann_splices.tsv file.

houruiyan commented 3 years ago

Dear Dr Chaung,

This is my calc_splizvd.log in the "work dir":

This is the MS_ann_splices.tsv file in my "work dir"

Thank you!

kaitlinchaung commented 3 years ago

Hi, if the column names of your metadata file are grouping_level_1 and grouping_level_2, then your config file should have: grouping_level_1 = grouping_level_1 grouping_level_2 = grouping_level_2

houruiyan commented 3 years ago

ok, thank you very much! I will try it! Thank you again!

houruiyan commented 3 years ago

It works. thank you!

kaitlinchaung commented 3 years ago

No problem!

wlei-amu commented 2 years ago

Hello， I want to run this tool for non-SICILIAN inputs,but I don't know what code to run, can you show me yours?Thanks!

wlei-amu commented 2 years ago

Hello， I want to run this tool for non-SICILIAN inputs,but I don't know what code to run, can you show me yours?Thanks!

If I configure the .config file,Where should I modify the.config file and what code should I run?Thanks!

juliaolivieri commented 2 years ago

Hellow @wlei-amu, what kind of data do you want to run on? 10X cellranger BAMs?

tjhwangxiong commented 2 years ago

Hellow @wlei-amu, what kind of data do you want to run on? 10X cellranger BAMs?

Dear juliaolivieri, I build SpliZ as following:

git clone https://github.com/salzmanlab/SpliZ.git
cd SpliZ
conda env create --name spliz_env --file=environment.yml
conda activate spliz_env
conda install nextflow

I have ran test data successfully via modifing small.config to set input_file = "small_data/small.pq".

Here, I wonder, if we run SpliZ using 10X cellranger BAMs, which config file shall we edit or generate? Can I justed modified the nextflow.config file as following:

// Global default params, used in configs
params {
  // Workflow flags for SpliZ
  // TODO nf-core: Specify your pipeline's command line flags
  dataname = wx
  input_file = wx_1.bam
  SICILIAN = false
  pin_S = 0.01
  pin_z = 0.0
  bounds = 5
  light = false
  svd_type = "normdonor"
  n_perms = 100
  grouping_level_1 = grouping_level_1
  grouping_level_2 = grouping_level_2
  libraryType = null
  run_analysis = false
  samplesheet = samplesheet.csv
  annotator_pickle = hg38_refseq.pkl
  exon_pickle = hg38_refseq_exon_bounds.pkl
  splice_pickle = hg38_refseq_splices.pkl
  meta = metadata.tsv
  gtf = GRCh38_genomic.gtf
  rank_quant = 0
  outdir = './results/${params.dataname}'
  publish_dir_mode = 'copy'

Or should I generate a new config file? If so, how shall I load the new config file.

Thanks a lot.

salzman-lab / SpliZ

The question about .config file #4