seqscope / ficture

https://seqscope.github.io/ficture/
Other
21 stars 2 forks source link

Pixel level decoding & visualization error #10

Closed jamesboot closed 2 months ago

jamesboot commented 2 months ago

Hi,

Thanks for a great package. It's my first time implementing it on some of our data. When running it seems to run well, until the pixel level decoding & visualization. I get all the expected outputs up until this point. I get the following error message:

Traceback (most recent call last):
  File "/data/home/hmy961/ficture/bin/ficture", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/ficture/cli.py", line 43, in main
    function(sys.argv[2:])
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/ficture/scripts/slda_decode.py", line 139, in slda_decode
    pixel.X = pixel.X.map('{:.2f}'.format)
              ^^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/pandas/core/generic.py", line 6299, in __getattr__
    return object.__getattribute__(self, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'DataFrame' object has no attribute 'X'
391.06, 14327.23; -507.56, 19309.86
/opt/sge_spool/8.6.12b/sdx25/job_scripts/3689561: line 204: bgzip: command not found
/opt/sge_spool/8.6.12b/sdx25/job_scripts/3689561: line 206: tabix: command not found
Traceback (most recent call last):
  File "/data/home/hmy961/ficture/bin/ficture", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/ficture/cli.py", line 43, in main
    function(sys.argv[2:])
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/ficture/scripts/de_bulk.py", line 39, in de_bulk
    info = pd.read_csv(args.input,sep='\t',header=0)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 620, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
    self._engine = self._make_engine(f, self.engine)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1880, in _make_engine
    self.handles = get_handle(
                   ^^^^^^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/pandas/io/common.py", line 765, in get_handle
    handle = gzip.GzipFile(  # type: ignore[assignment]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/share/apps/centos7/python/gcc/12.1.0/3.11.6/lib/python3.11/gzip.py", line 174, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/data/home/hmy961/ficture_trial/tma2/outs/analysis/nF12.d_12/nF12.d_12.decode.prj_12.r_4_5.posterior.count.tsv.gz'
['0' '1' '2' '3' '4' '5' '6' '7' '8' '9' '10' '11']
Traceback (most recent call last):
  File "/data/home/hmy961/ficture/bin/ficture", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/ficture/cli.py", line 43, in main
    function(sys.argv[2:])
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/ficture/scripts/factor_report.py", line 88, in factor_report
    post = pd.read_csv(f, sep='\t')
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 620, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
    self._engine = self._make_engine(f, self.engine)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1880, in _make_engine
    self.handles = get_handle(
                   ^^^^^^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/pandas/io/common.py", line 765, in get_handle
    handle = gzip.GzipFile(  # type: ignore[assignment]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/share/apps/centos7/python/gcc/12.1.0/3.11.6/lib/python3.11/gzip.py", line 174, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/data/home/hmy961/ficture_trial/tma2/outs/analysis/nF12.d_12/nF12.d_12.decode.prj_12.r_4_5.posterior.count.tsv.gz'
07:04:00 PM Background color 000000
07:04:00 PM Read color table (12)
07:04:00 PM Read header {}
Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11'], dtype='object', name='Name')
Traceback (most recent call last):
  File "/data/home/hmy961/ficture/bin/ficture", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/ficture/cli.py", line 43, in main
    function(sys.argv[2:])
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/ficture/scripts/plot_pixel_full.py", line 65, in plot_pixel_full
    loader = BlockIndexedLoader(args.input, args.xmin, args.xmax, args.ymin, args.ymax, args.full, not args.org_coord, idtype=dty)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/ficture/loaders/pixel_factor_loader.py", line 47, in __init__
    self.xmax = min(xmax, self.meta["SIZE_X"])
                          ~~~~~~~~~^^^^^^^^^^
KeyError: 'SIZE_X'
07:04:00 PM Read header {}
Traceback (most recent call last):
  File "/data/home/hmy961/ficture/bin/ficture", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/ficture/cli.py", line 43, in main
    function(sys.argv[2:])
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/ficture/scripts/plot_pixel_single.py", line 43, in plot_pixel_single
    loader = BlockIndexedLoader(args.input, args.xmin, args.xmax, args.ymin, args.ymax, args.full, not args.org_coord)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/hmy961/ficture/lib/python3.11/site-packages/ficture/loaders/pixel_factor_loader.py", line 47, in __init__
    self.xmax = min(xmax, self.meta["SIZE_X"])
                          ~~~~~~~~~^^^^^^^^^^
KeyError: 'SIZE_X'

Here is my script:

# Load Python module
module load python/3.11.6

# Activate virtualenv
source /data/home/hmy961/ficture/bin/activate

# Define input folder
path=/data/home/hmy961/ficture_trial/tma2/outs

# Define scales etc.
mu_scale=1                                          # If your data's coordinates are already in micrometer
key=Count
major_axis=Y                                        # If your data is sorted by the Y-axis

# Create pixel mini batches 
batch_size=500
batch_buff=30
input=${path}/filtered.matrix.TMA2.tsv.gz
batch=${path}/batched.matrix.tsv

ficture make_spatial_minibatch --input ${input} \
--output ${batch} \
--mu_scale ${mu_scale} \
--batch_size ${batch_size} \
--batch_buff ${batch_buff} \
--major_axis ${major_axis}

sort -S 4G -k2,2n -k1,1g ${batch} | gzip -c > ${batch}.gz
rm ${batch}

# Prepare training mini batches
train_width=12                                      # \sqrt{3} x the side length of the hexagon (um)
min_ct_per_unit=50
input=${path}/filtered.matrix.TMA2.tsv.gz
out=${path}/hexagon.d_${train_width}.tsv

ficture make_dge --key ${key} \
--count_header ${key} \
--input ${input} \
--output ${out} \
--hex_width ${train_width} \
--n_move 2 \
--min_ct_per_unit ${min_ct_per_unit} \
--mu_scale ${mu_scale} \
--precision 2 \
--major_axis ${major_axis}

sort -S 4G -k1,1n ${out} | gzip -c > ${out}.gz      # Shuffle hexagons
rm ${out}

# Model training

# Parameters for initializing model
nFactor=12                                          # Number of factors
sliding_step=2
train_nEpoch=3
# train_width=12                                    # Should be the same as used in the above step
model_id=nF${nFactor}.d_${train_width}              # An identifier kept in output file names
min_ct_per_feature=20                               # Ignore genes with total count \< 20
R=10                                                # We use R random initializations and pick one to fit the full model
thread=${NSLOTS}                                    # Number of threads to use

# Model initialization

# Parameters
min_ct_per_unit_fit=20
cmap_name="turbo"

# Output identifiers
model_id=nF${nFactor}.d_${train_width}
output_id=${model_id}
output_path=${path}/analysis/${model_id}
figure_path=${output_path}/figure
if [ ! -d "${figure_path}/sub" ]; then
    mkdir -p ${figure_path}/sub
fi

# Input files
hexagon=${path}/hexagon.d_${train_width}.tsv.gz
pixel=${path}/filtered.matrix.TMA2.tsv.gz
feature=${path}/feature.clean.TMA2.tsv.gz

# Output
output=${output_path}/${output_id}
model=${output}.model.p

# Fit model
ficture fit_model --input ${hexagon} \
--output ${output} \
--feature ${feature} \
--nFactor ${nFactor} \
--epoch ${train_nEpoch} \
--epoch_id_length 2 \
--unit_attr X Y \
--key ${key} \
--min_ct_per_feature ${min_ct_per_feature} \
--test_split 0.5 \
--R ${R} \
--thread ${thread}

# Choose color
input=${output_path}/${output_id}.fit_result.tsv.gz
output=${figure_path}/${output_id}
cmap=${figure_path}/${output_id}.rgb.tsv

ficture choose_color --input ${input} \
--output ${output} \
--cmap_name ${cmap_name}

# Coarse plot for inspection
cmap=${figure_path}/${output_id}.rgb.tsv
input=${output_path}/${output_id}.fit_result.tsv.gz
output=${figure_path}/${output_id}.coarse
fillr=$((train_width/2+1))

ficture plot_base --input ${input} \
--output ${output} \
--fill_range ${fillr} \
--color_table ${cmap} \
--plot_um_per_pixel 1 \
--plot_discretized

# Pixel level decoding

# Parameters
fit_width=12                                                            # Often equal or smaller than train_width (um)
anchor_res=4                                                            # Distance between adjacent anchor points (um)
fit_nmove=$((fit_width/anchor_res))
anchor_info=prj_${fit_width}.r_${anchor_res}
radius=$(($anchor_res+1))
anchor_info=prj_${fit_width}.r_${anchor_res}                            # An identifier
coor=${path}/coordinate_minmax.tsv
cmap=${figure_path}/${output_id}.rgb.tsv

# Transform
output=${output_path}/${output_id}.${anchor_info}
ficture transform --input ${pixel} \
--output_pref ${output} \
--model ${model} \
--key ${key} \
--major_axis ${major_axis} \
--hex_width ${fit_width} \
--n_move ${fit_nmove} \
--min_ct_per_unit ${min_ct_per_unit_fit} \
--mu_scale ${mu_scale} \
--thread ${thread} \
--precision 2

# Pixel level decoding & visualization
prefix=${output_id}.decode.${anchor_info}_${radius}
input=${path}/batched.matrix.tsv.gz
anchor=${output_path}/${output_id}.${anchor_info}.fit_result.tsv.gz
output=${output_path}/${prefix}
topk=5                                                                  # Output only a few top factors per pixel

ficture slda_decode --input ${input} \
--output ${output} \
--model ${model} \
--anchor ${anchor} \
--anchor_in_um \
--neighbor_radius ${radius} \
--mu_scale ${mu_scale} \
--key ${key} \
--precision 0.1 \
--lite_topk_output_pixel ${topk} \
--lite_topk_output_anchor ${topk} \
--thread ${thread}

# Optional post-processing
input=${output_path}/${prefix}.pixel.tsv.gz                             # j, X, Y, K1, ..., KJ, P1, ..., PJ, J=topk
output=${output_path}/${prefix}.pixel.sorted.tsv.gz

K=$( echo $model_id | sed 's/nF\([0-9]\{1,\}\)\..*/\1/' )
while IFS=$'\t' read -r r_key r_val; do
    export "${r_key}"="${r_val}"
done < ${coor}
echo -e "${xmin}, ${xmax}; ${ymin}, ${ymax}"

offsetx=${xmin}
offsety=${ymin}
rangex=$( echo "(${xmax} - ${xmin} + 0.5)/1+1" | bc )
rangey=$( echo "(${ymax} - ${ymin} + 0.5)/1+1" | bc )
bsize=2000
scale=100
header="##K=${K};TOPK=3\n##BLOCK_SIZE=${bsize};BLOCK_AXIS=X;INDEX_AXIS=Y\n##OFFSET_X=${offsetx};OFFSET_Y=${offsety};SIZE_X=${rangex};SIZE_Y=${rangey};SCALE=${scale}\n#BLOCK\tX\tY\tK1\tK2\tK3\tP1\tP2\tP3"

(echo -e "${header}" && zcat ${input} | tail -n +2 | perl -slane '$F[0]=int(($F[1]-$offx)/$bsize) * $bsize; $F[1]=int(($F[1]-$offx)*$scale); $F[1]=($F[1]>=0)?$F[1]:0; $F[2]=int(($F[2]-$offy)*$scale); $F[2]=($F[2]>=0)?$F[2]:0; print join("\t", @F);' -- -bsize=${bsize} -scale=${scale} -offx=${offsetx} -offy=${offsety} | sort -S 4G -k1,1g -k3,3g ) | bgzip -c > ${output}

tabix -f -s1 -b3 -e3 ${output}
rm ${input}

# DE Analysis
max_pval_output=1e-3
min_fold_output=1.5
input=${output_path}/${prefix}.posterior.count.tsv.gz
output=${output_path}/${prefix}.bulk_chisq.tsv

ficture de_bulk --input ${input} \
--output ${output} \
--min_ct_per_feature ${min_ct_per_feature} \
--max_pval_output ${max_pval_output} \
--min_fold_output ${min_fold_output} \
--thread ${thread}

# Report (color table and top DE genes)
cmap=${output_path}/figure/${output_id}.rgb.tsv
output=${output_path}/${prefix}.factor.info.html

ficture factor_report --path ${output_path} \
--pref ${prefix} \
--color_table ${cmap}

# Make pixel level figures
cmap=${output_path}/figure/${output_id}.rgb.tsv
input=${output_path}/${prefix}.pixel.sorted.tsv.gz
output=${figure_path}/${prefix}.pixel.png

ficture plot_pixel_full --input ${input} \
--color_table ${cmap} \
--output ${output} \
--plot_um_per_pixel 0.5 \
--full

# Make single factor heatmaps
output=${figure_path}/sub/${prefix}.pixel

ficture plot_pixel_single --input ${input} \
--output ${output} \
--plot_um_per_pixel 0.5 \
--full \
--all
hyunminkang commented 2 months ago

Hi @jamesboot, it looks that the script suggest that your input matrix does not X/Y coordinates.

Have you tried run_together command? It might be easier to identify where the problem occurs using such an automated script first.

jamesboot commented 2 months ago

Hi @hyunminkang - thanks for the suggestion! I'll run the run_together command now and get back to you.

jamesboot commented 2 months ago

Hi @hyunminkang - thanks for suggestion again. I re-ran the analysis using the run_together command. At first I got an error, which seemed to be due to htslib not being available on my system. After installing/loading the htslib package the run_together command worked and I got all the outputs!

Thanks!

hyunminkang commented 2 months ago

Great! Thanks for the feedback.