How gene-counts, transcript-counts are being generated?

jaudoux / simct

A configurable generator of simulated RNA-Seq data that can emulate any specific biological mechanism and provide robust data sets covering cases such as fusion genes (or fusions).

http://cractools.gforge.inria.fr/softwares/simct/

1 stars 0 forks source link

How gene-counts, transcript-counts are being generated? #1

Open FadelBerakdar opened 5 years ago

FadelBerakdar commented 5 years ago

Hi @jaudoux

I am wondering how gene-counts.tsv.gz, transcript-counts.tsv.gz are being generated, Are they raw-counts, or using a feature counting tool? In my opinion, it would be so handy if the simulator can generate a SAM like file contains all the reads, which can be passed to HTSeq-count for example, so we can have more comparable truth set.

jaudoux commented 5 years ago

Hi @FadelBerakdar,

The gene-counts and transcript-counts file are generated directly from FluxSimulator Output which handle the generation of transcription profiles and in-sillico sequencing.

We could output BAM files with exact alignements, but this would bias the analysis you will make of such BAM as they were not produced by an aligner...

Also one thing that worth to mention is that the read names holds the Truth alignement : https://github.com/jaudoux/simct#read-name-encoding, if this can help.

FadelBerakdar commented 5 years ago

thanks for your reply, I will check flux documentation.

Out of curiosity, I wrote simple python script to convert it to SAM, and I will update you with our findings :))

jaudoux commented 5 years ago

Awesome, if you are willing to share the script as a gist, I can add it to the doc for other people to benefit from it ;)

FadelBerakdar commented 5 years ago

yea sure, that would be great. I will edit to be more user-friendly and add it here.

FadelBerakdar commented 5 years ago

Hello, I apologize for the late reply. Lots of false positives/ false negatives stuff in my life to handle. so I added new repository to contain all scripts I wrote or may write around benchCT/simCT. I have already tested them and I am using them in debugging our new aligner. "novoSplice". Yet I appreciate aggressive testing as well :P