single-cell-genetics / XClone

Detection of allele-specific subclonal copy number alterations from single-cell transcriptomic data.
https://xclone-cnv.readthedocs.io/en/latest/
Apache License 2.0
29 stars 3 forks source link

Missing documentation in xclone.pp.xclonedata #20

Closed KatharinaSchmid closed 3 weeks ago

KatharinaSchmid commented 3 weeks ago

Hi,

thanks for providing this interesting tool. I have problems to run XClone with a count matrix, which was not generated by xcltk. In your documentation for xclone.pp.xclonedata, the documentation is missing how the file regions_anno_file should look like (and actually also what the parameter data_notes is about):

xclonedata(Xmtx, data_mode, mtx_barcodes_file, regions_anno_file=None, genome_mode='hg38_genes', data_notes=None)
    Extracting `xcltk` output as anndata for the input of XClone.

    Parameters
    ----------

        Xmtx : csr_mtx or csr_mtx path
            The input data matrix/path; or a list of data matrix/paths to the matrix files.
        data_mode : str
            Mode of the data, either 'BAF' or 'RDR'.
        mtx_barcodes_file : str
            Path to the barcodes file.
        genome_mode : str, optional
            Genome mode, one of 'hg38_genes', 'hg38_blocks', 'hg19_genes', 
            'hg19_blocks', or 'mm10_genes'. Default is 'hg38_genes'.

When I try a comma-separated file, with the columns described as in your tutorial for the feature annotation, I get an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kschmid/miniconda3/envs/xclone/lib/python3.9/site-packages/xclone/preprocessing/_data.py", line 267, in xclonedata
    Xadata = AnnData(RDR, obs=cell_anno, var=regions_anno) # dtype='int32'
  File "/Users/kschmid/miniconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/anndata.py", line 254, in __init__
    self._init_as_actual(
  File "/Users/kschmid/miniconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/anndata.py", line 428, in _init_as_actual
    self._var = _gen_dataframe(
  File "/Users/kschmid/miniconda3/envs/xclone/lib/python3.9/functools.py", line 888, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/Users/kschmid/miniconda3/envs/xclone/lib/python3.9/site-packages/anndata/_core/aligned_df.py", line 65, in _gen_dataframe_df
    raise _mk_df_error(source, attr, length, len(anno))
ValueError: Observations annot. `var` must have as many rows as `X` has columns (17910), but has 17911 rows.

This is the header of the file I try to use for regions_anno_file at the moment:

GeneName,GeneID,chr,start,stop,arm,chr_arm,band
FAM138A,ENSG00000237613,1,34554,36081,p,1p,p36.33
OR4F5,ENSG00000186092,1,65419,71585,p,1p,p36.33
OR4F29,ENSG00000284733,1,450703,451697,p,1p,p36.33
OR4F16,ENSG00000284662,1,685679,686673,p,1p,p36.33
FAM87B,ENSG00000177757,1,817371,819837,p,1p,p36.33
FAM41C,ENSG00000230368,1,868071,876903,p,1p,p36.33
SAMD11,ENSG00000187634,1,923928,944581,p,1p,p36.33
NOC2L,ENSG00000188976,1,944204,959309,p,1p,p36.33
KLHL17,ENSG00000187961,1,960587,965715,p,1p,p36.33

Could you explain me how I need to do it instead? Thanks for your help.

Rongtingting commented 3 weeks ago

Hi @KatharinaSchmid , Thank you for your question.

Are you using the count matrix generated by cellranger? Recently, I have added one new function for reading the count matrix generated by cellranger.

You may find the demo jupyter notebook useful: demo_GX109_scRNA_RDR_fromcellranger.ipynb

rdr_dir = "./data/rdr_cellranger/"
cell_anno_file = "./data/rdr_cellranger/cell_anno.tsv"
out_dir = "./result/"

RDR_adata = xclone.pp.readrdr_mtx(rdr_dir)

RDR_adata = xclone.pp.extra_anno(
    RDR_adata,
    cell_anno_file,
    barcodes_key = "cell",
    cell_anno_key = "cell_type",
    sep = "\t"
)

RDR_adata

And for your question about the regions_anno_file, from the error log (ValueError: Observations annot. var must have as many rows as X has columns (17910), but has 17911 rows.) I guess it may come from the column name, which is one row that does not match the dataset.

Bests, Rongting

Rongtingting commented 3 weeks ago

data_note is any string you wanna add for this dataset.

By default, it will record the time you creat the dataset.

_data.py#L278

if data_notes is None:
        data_notes = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
else:
        data_notes = data_notes + ": " + time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
Xadata.uns["data_notes"] = data_notes
KatharinaSchmid commented 3 weeks ago

Hey, thanks this was in deed exactly what I was looking for. And it is working now :)