found 0 transcripts - Githubissues

Problem: Sequences names in FASTA files don't match their names in GTF file.

Log message:

Looking for FASTA references in genome_dir/References found :

Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_XIII => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_XIII.fa.gz
Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_XIV => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_XIV.fa.gz
Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_III => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_III.fa.gz
Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_II => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_II.fa.gz
Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_XII => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_XII.fa.gz
Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_XVI => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_XVI.fa.gz
Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_VIII => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_VIII.fa.gz
Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_VII => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_VII.fa.gz
Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_Mito => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_Mito.fa.gz
Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_X => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_X.fa.gz
Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_V => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_V.fa.gz
Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_IX => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_IX.fa.gz
Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_I => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_I.fa.gz
Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_XI => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_XI.fa.gz
Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_IV => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_IV.fa.gz
Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_VI => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_VI.fa.gz
Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_XV => genome_dir/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.id_XV.fa.gz Calculating references length Loading annotations Building GenomeSimulator (reading annotations) Generate random mutations (ins,del,sub) Generate random fusions Generate the simulated genome as FASTA and GTF Generate flux simulation Flux-Simulator v1.2.1 (Flux Library: 1.22)

[INFO] I am collecting information on the run. initializing profiler

[INFO] Reading error model 76 bases model [WARN] The error model supports a read length of 76 but you are trying to create reads of length 100. We are scaling.

[INFO] Checking GTF file [PROFILING] I am assigning the expression profile Reading reference annotation OK (00:00:00) found 0 transcripts

[PROFILING] Parameters NB_MOLECULES 5000000 EXPRESSION_K -0.6 EXPRESSION_X0 9500.0 EXPRESSION_X1 9.025E7 PRO_FILE_NAME /home/zingo/simulation/dataset/FluxSimulator/fluxSimulator.pro

profiling  OK (00:00:00)
Updating .pro file   OK (00:00:00)
molecules   0

[ERROR] Profiler has no molecules! java.lang.RuntimeException: Profiler has no molecules! at barna.flux.simulator.SimulationPipeline.call(SimulationPipeline.java:438) at barna.flux.simulator.SimulationPipeline.call(SimulationPipeline.java:54) at barna.commons.launcher.Flux.main(Flux.java:198)

Cannot open dataset/FluxSimulator/fluxSimulator.fastq at /home/zingo/perl5/lib/perl5/CracTools/Utils.pm line 551.

Solution: This FASTA splitter script works fine, please initialize path2MultiFasta, path2outputDir variables

`
import gzip

path2MultiFasta = ""
path2outputDir  = ""

file = gzip.open(path2MultiFasta, "r")

seqName   = ""
seqHeader = ""
seqBuffer = []
seqLength = 0

if path2outputDir[len(path2outputDir) - 1] != '/':
    path2outputDir += '/'

# >mitochondrion_genome 19517
# mitochondrion_genome 19524

while True:
    line = file.readline().decode('ascii')

    if line == "":
        break

    if line[0] == '>':
        if seqName != "":
            print(seqName, seqLength)

            fileName = path2outputDir + seqName[1:] + ".fa"
            out = open(fileName, "w")
            out.write(seqName)
            out.write("\n")

            for ln in seqBuffer:
                out.write(ln)

            out.close()

        seqName   = line.split(' ')[0]
        seqHeader = line[:-1]
        seqLength = 0
        seqBuffer = []

    else:
        seqBuffer.append(line)
        seqLength += len(line) - 1

if seqName != "":
    print(seqName, seqLength)
    fileName = path2outputDir + seqName[1:] + ".fa"
    out = open(fileName, "w")
    out.write(seqName)
    out.write("\n")

    for ln in seqBuffer:
        out.write(ln)

    out.close()`

jaudoux / simct

found 0 transcripts #5