nf-core / nascent

Nascent Transcription Processing Pipeline
https://nf-co.re/nascent
MIT License
18 stars 10 forks source link

groHMM fails with CHM13 occasionally #172

Open edmundmiller opened 1 month ago

edmundmiller commented 1 month ago

Description of the bug

Details

```console Command output: Returning to R Enivorment :) $c.UTS..1.UTS...1. [1] 45.00000000 0.02222222 -1.00000000 $c.0.5..10...1. [1] 0.2564279 264.2384571 -1.0000000 $c.log.1...exp.LtProbA....LtProbA. [1] -0.006589323 -5.025597509 $c.LtProbB..log.1...exp.LtProbB... [1] -100 0 [1] "Input transcript annotations" [1] "Printing kg_tx......." GRanges object with 181708 ranges and 3 metadata columns: seqnames ranges strand | gene_id tx_id | [1] chr1 15080-21429 + | LOC124905335_1 1 [2] chr1 205295-214212 + | LINC01409 2 [3] chr1 205329-214212 + | LINC01409 3 [4] chr1 220504-227987 + | LOC124903817 4 [5] chr1 246378-248841 + | FAM87B 5 ... ... ... ... . ... ... [181704] chrY 26841747-26844871 - | LOC107987352 181704 [181705] chrY 27005778-27047281 - | REREP2Y 181705 [181706] chrY 27213049-27221044 - | LOC105377244 181706 [181707] chrY 62439553-62441822 - | WASIR1_1 181707 [181708] chrY 62449384-62451910 - | DDX11L16_1 181708 tx_name [1] [2] [3] [4] [5] ... ... [181704] [181705] [181706] [181707] [181708] ------- seqinfo: 24 sequences from an unspecified genome; no seqlengths [1] "Collapse annotations in preparation for overlap" [1] "Finished consensus annotations" [1] "repairing with annotations" 564 transcripts are broken into 1269 548 transcripts are broken into 1252 Command error: Warning message: 'memory.limit()' is Windows-specific Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Warning messages: 1: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID, : some transcripts have no "transcript_id" attribute ==> their name ("tx_name" column in the TxDb object) was set to NA 2: In .extract_transcripts_from_GRanges(tx_IDX, gr, mcols0$type, mcols0$ID, : the transcript names ("tx_name" column in the TxDb object) imported from the "transcript_id" attribute are not unique 3: In .find_exon_cds(exons, cds) : The following transcripts have exons that contain more than one CDS (only the first CDS was kept for each exon): NM_001134939.1, NM_001172437.2, NM_001184961.1, NM_001301020.1, NM_001301302.1, NM_001301371.1, NM_002537.3, NM_004152.3, NM_015068.3, NM_016178.2 Reduce isoforms(22385) ... OK Truncate overlapped ranges ... OK Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'end': subscript contains NAs Calls: combineTranscripts ... normalizeSingleBracketSubscript -> NSBS -> NSBS -> .subscript_error Execution halted ```

Sometimes it works, sometimes it doesn't.

Don't want this edge case to hold up merging #165

Command used and terminal output

No response

Relevant files

No response

System information

No response