Open FadelBerakdar opened 5 years ago
Hi @FadelBerakdar,
The gene-counts and transcript-counts file are generated directly from FluxSimulator Output which handle the generation of transcription profiles and in-sillico sequencing.
We could output BAM files with exact alignements, but this would bias the analysis you will make of such BAM as they were not produced by an aligner...
Also one thing that worth to mention is that the read names holds the Truth alignement : https://github.com/jaudoux/simct#read-name-encoding, if this can help.
thanks for your reply, I will check flux documentation.
Out of curiosity, I wrote simple python script to convert it to SAM, and I will update you with our findings :))
Awesome, if you are willing to share the script as a gist, I can add it to the doc for other people to benefit from it ;)
yea sure, that would be great. I will edit to be more user-friendly and add it here.
Hello, I apologize for the late reply. Lots of false positives/ false negatives stuff in my life to handle. so I added new repository to contain all scripts I wrote or may write around benchCT/simCT. I have already tested them and I am using them in debugging our new aligner. "novoSplice". Yet I appreciate aggressive testing as well :P
Hi @jaudoux
I am wondering how gene-counts.tsv.gz, transcript-counts.tsv.gz are being generated, Are they raw-counts, or using a feature counting tool? In my opinion, it would be so handy if the simulator can generate a SAM like file contains all the reads, which can be passed to HTSeq-count for example, so we can have more comparable truth set.