jaudoux / simct

A configurable generator of simulated RNA-Seq data that can emulate any specific biological mechanism and provide robust data sets covering cases such as fusion genes (or fusions).
http://cractools.gforge.inria.fr/softwares/simct/
1 stars 0 forks source link

found 0 transcripts #5

Open FadelBerakdar opened 4 years ago

FadelBerakdar commented 4 years ago

Problem: Sequences names in FASTA files don't match their names in GTF file.

Log message:

Looking for FASTA references in genome_dir/References found :

[INFO] I am collecting information on the run. initializing profiler

[INFO] Reading error model 76 bases model [WARN] The error model supports a read length of 76 but you are trying to create reads of length 100. We are scaling.

[INFO] Checking GTF file [PROFILING] I am assigning the expression profile Reading reference annotation OK (00:00:00) found 0 transcripts

[PROFILING] Parameters NB_MOLECULES 5000000 EXPRESSION_K -0.6 EXPRESSION_X0 9500.0 EXPRESSION_X1 9.025E7 PRO_FILE_NAME /home/zingo/simulation/dataset/FluxSimulator/fluxSimulator.pro

profiling  OK (00:00:00)
Updating .pro file   OK (00:00:00)
molecules   0

[ERROR] Profiler has no molecules! java.lang.RuntimeException: Profiler has no molecules! at barna.flux.simulator.SimulationPipeline.call(SimulationPipeline.java:438) at barna.flux.simulator.SimulationPipeline.call(SimulationPipeline.java:54) at barna.commons.launcher.Flux.main(Flux.java:198)

Cannot open dataset/FluxSimulator/fluxSimulator.fastq at /home/zingo/perl5/lib/perl5/CracTools/Utils.pm line 551.

FadelBerakdar commented 4 years ago

Solution: This FASTA splitter script works fine, please initialize path2MultiFasta, path2outputDir variables

`
import gzip

path2MultiFasta = ""
path2outputDir  = ""

file = gzip.open(path2MultiFasta, "r")

seqName   = ""
seqHeader = ""
seqBuffer = []
seqLength = 0

if path2outputDir[len(path2outputDir) - 1] != '/':
    path2outputDir += '/'

# >mitochondrion_genome 19517
# mitochondrion_genome 19524

while True:
    line = file.readline().decode('ascii')

    if line == "":
        break

    if line[0] == '>':
        if seqName != "":
            print(seqName, seqLength)

            fileName = path2outputDir + seqName[1:] + ".fa"
            out = open(fileName, "w")
            out.write(seqName)
            out.write("\n")

            for ln in seqBuffer:
                out.write(ln)

            out.close()

        seqName   = line.split(' ')[0]
        seqHeader = line[:-1]
        seqLength = 0
        seqBuffer = []

    else:
        seqBuffer.append(line)
        seqLength += len(line) - 1

if seqName != "":
    print(seqName, seqLength)
    fileName = path2outputDir + seqName[1:] + ".fa"
    out = open(fileName, "w")
    out.write(seqName)
    out.write("\n")

    for ln in seqBuffer:
        out.write(ln)

    out.close()`