schnitzer-lab / EXTRACT-public

EXTRACT is a tractable and robust automated cell extraction tool for calcium imaging, which extracts the activities of cells as time series from both one-photon and two-photon calcium imaging movies.
MIT License
62 stars 16 forks source link

Merged cells #5

Closed rl72 closed 2 years ago

rl72 commented 3 years ago

Hi again :)

I ran EXTRACT on 1P calcium imaging movies and I noticed that multiple cells are often merged into big patches which are respectively identified as one cell and that despite having chosen the correct avg_cell_radius. I was wondering if there is any parameter in particular that I could play around with to try to solve this issue?

Thanks and best regards, Lumi

fatihdinc commented 3 years ago

Hey Lumi,

Thank you for the question. This is interesting, I can think of few reasons why this can happen. But the fact that it happens on multiple cells and quite often tells me that you need to tune some of the internal parameters.

First things first, I suggest the following changes:

Start by setting config.max_iter=0. This will shut down the consequent cell-refinement process and return only the results of the individual cell-finding process. If you find that the resulting cells are merged, then it is an issue with the cell-finding and you need to change the preprocessing module parameters. If the cells are not merged, then merging happens at the cell-refinement. Thus, you need to change some parameters in that module.

If the problem stems from the the cell finding process:

Set config.init_with_gaussian=1 config.adaptive_kappa=1 config.medfilt_outlier_pixels=1;

Then, you could play around the highpass and lowpass cutoff frequencies: config.spatial_highpass_cutoff (default 2) config.spatial_lowpass_cutoff (default 5)

Note that config.spatial_highpass_cutoff < config.spatial_lowpass_cutoff should remain to be the case at all times.

If the problem stems from the cell refinement process:

Increase the thresholds.T_dup_corr_thresh and thresholds.S_dup_corr_thresh. Decrease config.kappa_std_ratio. If these do not fix the issue, I am happy to take a look at the movie myself.

I hope these help!

Fatih

rl72 commented 3 years ago

Dear Fatih,

Thanks a lot for your advice, it was very helpful! The merging problem occurred mainly during the cell finding phase (and the cell refinement step somehow made it worse). Changing the config.spatial_highpass_cutoff and the config.spatial_lowpass_cutoff made a lot of difference, the result makes much more sense. But now I wonder whether the algorithm finds all the cells that it could...

Of course, if you are so kind as to have a look over a short piece of my movie, I would be really really grateful! Please find attached a link to the movie, I have preprocessed it the following way :

https://drive.google.com/file/d/1ewGY7CTQNgbosmFb9mK7BAl6fzZt1Hxq/view?usp=sharing

The EXTRACT parameters I found to be the best for now are config.use_gpu=1; config.avg_cell_radius=6; config.trace_output_option='nonneg'; config.num_partitions_x=1; config.num_partitions_y=1; config.cellfind_min_snr=1; config.T_min_snr=1; config.init_with_gaussian=1; config.adaptive_kappa=1; config.medfilt_outlier_pixels=1; config.spatial_highpass_cutoff=8; config.spatial_lowpass_cutoff=10;

Please let me know of what you think. Thanks a lot, Lumi

fatihdinc commented 3 years ago

Dear Lumi,

Thanks for your message. I can take a look at your movies around after May 26th, which is the date of my quals :) But I am confident we can make this work way before then. If you can send me an email to my stanford address, perhaps we can arrange a quick Zoom, over which I can explain how to tune the parameters so that you get a better recall value. Let me try to explain here:

EXTRACT has a high precision value, as you probably realized. This is because of the thresholds metrics we are computing and then subsequently using to remove spurious cells. These metrics are optimized for a usual cortex movie, thus while they are quite robust empirically, as you change from area to area and go deeper, you might need to re-tune some of them, especially for challenging movies.

To check that you have all the cells found during cell-finding step, set max_iter=0. Then, run the usual code and check the cells found by the algorithm by following:

figure, imshow(max(M,[],3),[]) % Maximum projection plot_cells_overlay(output.spatial_weights,[],[]) % Overlays found cells

If you believe you haven't found all the cells here, gradually lower config.thresholds.T_min_snr (default 10) until you are satisfied with the recall value. The value of T_min_snr you found during this process is your new default for that type of movie. In fact, if you actually plot cell SNRs (check remove_redundant.m to see where thresholds are coming into play, T_min_snr is a threshold for minimum snr for a cell), you will see that spurious cells follow a Gaussian distribution, T_min_snr is usually chosen at the point where the null distribution starts deviating from Gaussian. In a usual movie, this does not happen until SNR>>10, thus default value works well for a large variety of movies. In more challenging movies, cell SNRs come closer to null distribution, which is when you need to tune this specific to a movie. This is how you can check the SNR distribution of all found cells (regardless of whether they were removed later on) on an already EXTRACTed output:

% fmap assigns names to the matrix containing the metrics [fmap, ~] = get_quality_metric_map;

%metrics is a matrix that contains all the metrics we use to remove spurious cells metrics=output.info.summary.classification.metrics; %this notation gives you the cell snr, others will give you other values check remove_redundant.m inside helper functions cell_snrs = metrics(fmap('T_maxval'), :);

I bet that you will have found all the cells in your movie with the default value, so problem is not a harsh cutoff on recall, but on precision. This brings me to my next topic.

If you find that you actually find all the cells but the algorithm is then discarding them, this means the precision metrics are too harsh for your movie. Recall that EXTRACT has two steps: cell-finding and cell-refinement. During cell-refinement, EXTRACT achieves near perfect precision thanks to thresholds. Here is an example threshold (check remove_reduntant.m for rest)

metrics(fmap('S_corruption'), :) >= thresholds.spatial_corrupt_thresh); (line 117)

It is likely that some of these thresholds are too tight. Luckily, you can check this by your already EXTRACTed output like this:

[fmap, ~] = get_quality_metric_map; metrics=output.info.summary.classification.metrics; corruptions = metrics(fmap('S_corruption'), :);

Then, you can create an histogram from corruptions and check whether the default value of 0.7 works for your movie. You can do for other thresholds and tune the parameters for your movie. In my experience, S_corruption is usually the metric to start looking at.

I am working towards writing some scripts that will ease these processes over time. But, that will still take some time to come out. On the other hand, these tricks are very helpful (and relatively easy) to master, as you once master these tricks, there is in practice no movie that you cannot EXTRACT cells from, at least in my experience so far.

fatihdinc commented 3 years ago

Let me give you an example of an histogram with the spatial corruptness metric. The x-axis is the S_corruption, y-axis is the number of cells. As you can see, there are two distinct Gaussian-like shapes here, one at a lower value corresponding to actual cells and one at a larger value corresponding to spurious cells. If I use the default value of 0.7 cutoff, many cells from the actual cells are discarded. Thus, I increased the config.thresholds.spatial_corrupt_thresh to around 1.5-2 for this movie.

Ekran Resmi 2021-04-19 06 00 47
rl72 commented 3 years ago

Dear Fatih,

You are totally right, I have noticed indeed that the algorithm finds all the cells but eliminates a part of them during the refinement step. I will try playing around with the precision metrics as you suggested and will let you know of the outcome! Thanks again for the thorough explanation!

rl72 commented 3 years ago

Hey Fatih,

There is something I was wondering in fact about. I noticed that the parameters (at least in terms of highpass and lowpass filters) that I found to be optimal for a shorter piece of a movie are suboptimal when applied to the same movie, but full length (2 minutes versus 20 minutes for example). I would like to better understand how does the length of the movie affect the output, aren't these filters applied frame by frame?

Thanks and best wishes, Lumi

fatihdinc commented 3 years ago

Hey Lumi,

Good question! It is done frame by frame but it is possible that the movie has non-stationary background and thus as you include more frames your filtering parameters change. I would think that if you have a sufficiently long movie, the parameters would stop fluctuating though.

That being said, your filtering parameters are very mild. So you could actually just cancel filtering altogether by setting config.spatial_highpass_cutoff=inf and cellfind_filter_type='None'. I hope this is helpful

Fatih

rl72 commented 3 years ago

Dear Fatih,

Thanks a lot for your answer. I would have one more question, are the output traces of EXTRACT already denoised?

Thanks, Lumi

fatihdinc commented 3 years ago

Hey Lumi,

There is no explicit denoising done to traces by EXTRACT, although you can if you set config.smooth_T=1 (0 by default). EXTRACT thresholds activity below 0 (which is by assumption noise, see paper), but you can choose to receive the raw activity by setting confg.trace_output_option='raw'. This could lead to negative going spikes though, which are usually introduced in the time-dependent background removal stage of the pre-processing if the movie has a high oscillating background. Please check the FAQs on this.

Best, Fatih

rl72 commented 3 years ago

I see, then I just want to make sure, is the baseline fluorescence already subtracted in the output Ca2+ traces (be it raw or non-negative)?

Thanks and best regards, Lumi

fatihdinc commented 3 years ago

Yes. That happens during preprocessing where we compute delta F/F.

rl72 commented 3 years ago

Hi Fatih,

Thanks a lot for your answers. Then is it possible to obtain the raw level of fluorescence, as in without dF/F computing (without turning all the preprocessing module off) or do you recommend using the output as it is? Otherwise, is it possible to modify and play around with the parameters used in the baseline removal method (for example, the constant used to define the temporal interval or the number of time bins across which the mean intensity value is calculated)?

Best wishes, Lumi

fatihdinc commented 3 years ago

Dear Lumi,

Background removal calculates those values automatically from the movie, but on a high level, background removal is just a module that helps. You can always turn it off if you think you like the movie without it.

On your other questions, it is not clear to me what the main purpose of not doing dF/F is? dF/F is a process that just normalizes the movie and amplifies the activity vs the background. For further analysis, we do not want the background to bring additional signal to the calcium traces, so dF/F is kind of a standard in the field. Also, if you do not perform dF/F, then there is no purpose of using EXTRACT either. Remember, EXTRACT is helpful in the sense that it ignores large positive contaminants. If you just use the raw movie, the noise structure completely changes and you lose most of the advantage EXTRACT brings regarding ignoring those large noise contaminants.

On the other hand, if you wish to obtain the raw traces without dF/F, you could always do a multivariate regression to the raw movie using the cell filters that EXTRACT finds. I want to emphasize that this is not something we advise.

Best, Fatih

rl72 commented 3 years ago

Dear Fatih,

Thank you for the clarifications. I was basically wondering if the parameters for baseline subtraction and dF/F computing were the best fit for my movie by default or whether that could be something that I should choose and change depending on the type of signal I have (for example, the events in my movie are quite long so having too short temporal windows for calculating the baseline as described in the paper would be disadvantageous). But it is great if these values are calculated automatically, I will take your advice and go with the dF/F then. Thanks a lot!