serenolopezdarwin / apanalysis

Analyze 3'UTR usage in single-cell data.
MIT License
12 stars 1 forks source link

About example #4

Closed Renown-TAL closed 1 year ago

Renown-TAL commented 1 year ago

Hello! Could you provice a example data for the workflow? I try to run the pipeline but most of file without a example format. Thank in advanced !

serenolopezdarwin commented 1 year ago

Please be more specific, which example formats are missing for which scripts/files? Most of the scripts are built to work with generic genomic file formats like BAMs, but some may need specific formats.

Renown-TAL commented 1 year ago

Please be more specific, which example formats are missing for which scripts/files? Most of the scripts are built to work with generic genomic file formats like BAMs, but some may need specific formats.

  1. At 'workflow.sh' line 11: # 5. A text file of baseline expression levels of mm10 genes named "mouse_median_expr.txt". I get the data from paper but using this pipeline for human data. Could you tell me if this pipeline just need the 'Median expression' and 'Ensembl Gene ID' columns?

  2. At 'workflow.sh' line 21: # 7. A csv file of cell annotations named "cell_annotations.csv". I don't known which annotaions should be included and the formats of it.

serenolopezdarwin commented 1 year ago
  1. "mouse_median_expr.txt" is in the format of a whitespace-separated file with columns [GENENAME EXPRESSION]

  2. "cell_annotations.csv" is a comma-separated file with columns [CELLNAME, EXONCOUNT, X, X, X, SAMPLEID, X, EXTRACTIONDATE, SAMPLEAGE, X, X, X, CLUSTERID, TSNE1, TSNE2, SUBCLUSTERID, SUBTSNE1, SUBTSNE2, X, X, X, X, CLUSTERNAME, TRAJECTORYNAME, UMAP1, UMAP2, UMAP3, REMAPTRAJECTORYNAME, REMAPUMAP1, REMAPUMAP2, REMAPUMAP3, SUBTRAJECTORYID, SUBUMAP1, SUBUMAP2, X, PSEUDOTIME]

Xs are placeholder columns that don't need any data in them, this is based on the indexing referenced from matrixlengthandisoformanalysis.py. These dimensionality reductions and trajectories are generated from Monocle3. You can generate them with Seurat or another software though. Sub-tsne and Sub-umap are just dimensionality reductions recalculated on your data subsetted to each of your cell types.

Let me know if there are any bugs with this.