Downstream processing FICTURE output

pakiessling commented 7 months ago

Congratulation on the preprint!

I am very interested in this work, as I have identified similiar problems with the Baysor memory footprint and the segmentation of muscle cells and adipocytes.

One thing that is not entirely clear to me is the output format of FICTURE. In the documentation this appears to be a list of blocks (cells?) and factors.

Can I get a cell x gene matrix from FICTURE as I would from Baysor? That would be quite important to me to E.g. cluster the output, integrate with scRNA, run through domain identification tools that need cell types + centroids...

Thank you!

Yichen-Si commented 7 months ago

Thanks for your interest. The output contains X and Y coordinates of each pixel (lets ignore the block column for now) and the top factors from the inference. The results are already clustered if you consider each factor as a cluster. Currently we do not have a cell segmentation function, as it is infeasible with only transcript information in most cases. I think the current output could be used for domain identification tools relatively easily though. One way is to just uniformly down-sample the pixels as "centroids", and use the factor assignments as cell types. An alternative way is to use the "anchor" level output generated together with the pixel level output (with suffix "anchor.tsv.gz"), and treat it as cell by cell type matrix. Note that the anchors are much denser than cells so you might still want to down sample it. I think for some downstream analysis explicit/precise cell segmentation is not necessary.

pakiessling commented 7 months ago

Thank you!

seqscope / ficture

Downstream processing FICTURE output #3