How to deal with incomplete data set and not ENTREZ IDs (novel isoforms)

Hi There, I am giving this tool a try instead my normal R packages and so far I required some code modification in the documentation https://spycone.readthedocs.io/en/latest/gene-level-workflow.html#Prepare-the-dataset. It would be great to have a more update version of this.

Now, my questions:

I normally do use ENTREZ IDs (ENSEMBL IDs), and I am also using novel isoforms, which means that not all of them have regular "gene Names". How could this be implemented in the pipelines (transcript and gene level)?
I have 5 time points and 5 replicates for each time point, but unfortunately, one of the samples needed to be removed from the data set due to quality issues. Then, when creating the Spycone object, the function complains. Is there a way to solve this?

Here the error: `Cell In[24], line 1 ----> 1 tp5_dset = spy.dataset(ts=df_counts_sort, 2 gene_id = gene_list, 3 symbs=gene_list, 4 species=9606, 5 reps1 = 5, 6 timepts = 5)

File ~/miniconda3/envs/jypyTimeSeries/lib/python3.11/site-packages/spycone/DataSet.py:126, in dataset.init(self, ts, species, reps1, timepts, gtf, gene_id, transcript_id, timeserieslist, symbs, discretization_steps) 123 self.ts[0] = np.array(self.ts[0], dtype="double") 125 if self.timepts*self.reps1 != self.ts[0].shape[1]: --> 126 raise ValueError("Number of columns is not the same as number of time points.") 128 if self.species not in self.SPECIES: 129 raise ValueError("Please provide a supported species ID.")

ValueError: Number of columns is not the same as number of time points.`

Thanks and all the best, Nicolas

yollct / spycone

How to deal with incomplete data set and not ENTREZ IDs (novel isoforms) #6