wenweixiong / MARVEL

38 stars 9 forks source link

Issues ComputePsi #17

Closed TdzBAS closed 1 year ago

TdzBAS commented 1 year ago

Hi Sean,

Thank you for your amazing tool. I have a dataset and have applied your pipeline from (https://wenweixiong.github.io/MARVEL_Plate.html). However, I'm encountering issues with the computepsi functions. The coordinates of the splice junctions are different. For the skipped exon case, you previously helped me by removing the minus 1 in the assignment of the end variable, and everything worked perfectly.

But for the other 4 common AS types, I haven't been able to implement it successfully.

I understand that the intron count matrix is still missing for retained introns. But would be awesome if you can help me out with the other splicing types.

I provided you the input files and my code. The marvel-object would be too big, since i used the whole human gtf file from gencode v43.

Thank you in advance for your assistance!

Best regards, Tolga marvel_test.zip

wenweixiong commented 1 year ago

Hi Tolga,

Please may you share the original output files from rMATS, i.e., the fromGTF.SE.txt, fromGTF.MXE.txt, fromGTF.RI.txt, fromGTF.A5SS.txt, fromGTF.A3SS.txt please?

Sean

TdzBAS commented 1 year ago

Hi Sean,

sure! I see that i did not use transcriptome.bam, but gene.bam.

Best, Tolga fromGTF.A3SS.txt fromGTF.A5SS.txt fromGTF.MXE.txt fromGTF.RI.txt fromGTF.SE.txt

wenweixiong commented 1 year ago

rMATS files look good Tolga.

I have updated MARVEL with functions to process the rMATS coordinates (fromGTF.SE.txt, fromGTF.MXE.txt, fromGTF.RI.txt, fromGTF.A5SS.txt, fromGTF.A3SS.txt file) into MARVEL splicing event metadata (SE_featureData.txt, MXE_featureData.txt, RI_featureData.txt, A5SS_featureData.txt, A3SS_featureData.txt). These files may be used directly for "Splicing event metadata" section of the tutorial (https://wenweixiong.github.io/MARVEL_Plate.html)

Please may you install the latest version of MARVEL from Github to access these functions (v2.0.4, https://github.com/wenweixiong/MARVEL).

And then download the example data and script from Google Drive (https://drive.google.com/file/d/1ilhgUdRQYC2ee1fbA54ZdxbbtB8tlQ2m/view?usp=sharing).

Further reading under section v2.0.4 on Github (https://github.com/wenweixiong/MARVEL)

TdzBAS commented 1 year ago

That sounds great! the added functions, should make things more convenient. But unfortunately It is not found: image

I installed the latest version of MARVEL (v2.04).

Do you know why It couldn't find the function?

wenweixiong commented 1 year ago

I fixed a typo in the function in v2.0.4 a few weeks back. Please may you re-install v2.0.4?

library(devtools) install_github("wenweixiong/MARVEL") library(MARVEL)

The Preprocess_rMATS() should then become available...

Screenshot 2023-06-30 at 04 20 01
TdzBAS commented 1 year ago

Hi Sean,

Thanks for this very nice update! All preprocessing functions are working without an error. Now I want to add the retained introns part. For this you also provided some scripts, but where do I get the GRCh38.primary_assembly.genome_bedtools.txt? I am not familiar with bedtools.

Best, Tolga

wenweixiong commented 1 year ago

Hi Tolga,

The GRCh38.primary_assembly.genome_bedtools.txt is a tab-delimited file indicating the size of each chromosome. The 1st column is the chromosome name, and the 2nd column is the size in bp. The example attached is for hg38 build and for chr1-22, X, Y, and M: hg38.chrom.sizes_edit.txt

To be safe, you may run the following Samtools code to generate your own chromosome size file:

samtools faidx input.fa cut -f1,2 input.fa.fai > sizes.genome

You may skip 1st line if you have already generated index file (.fai)

Sean

TdzBAS commented 1 year ago

Hi Sean,

big thanks! This was the missing piece. I will get it up running. Will keep you updated :)

when i do your above command and compare it with your provided file, I get the same results for the chromosmes. But I have extra lines like: KI270730.1 112551 KI270731.1 150754 KI270732.1 41543 KI270733.1 179772 KI270734.1 165050 KI270735.1 42811 KI270736.1 181920 KI270737.1 103838 KI270738.1 99375 KI270739.1 73985 KI270740.1 37240 KI270741.1 157432

Do you have an idea, what this could be?

Best, Tolga

wenweixiong commented 1 year ago

Excellent! These are genomic sequences that we're unsure which chromosomes they map to. Given that my and your list of chromosomes are different, you should use your list for the bedtools step so that it tallies with your reference genome file.

TdzBAS commented 1 year ago

Hi @sean,

Just wanted to tell you, that everything went perfectly fine! Thanks for updating the MARVEL-package with this convenient new functions!

Best, Tolga