nolanlab / citrus

Citrus Development Code
GNU General Public License v3.0
31 stars 20 forks source link

Cluster output files seem to be converted away from sense #76

Open jtheorell opened 9 years ago

jtheorell commented 9 years ago

Hi again, I would once again like to thank you for a great software. Here comes a more complicated matter: I would like to export the specific cluster data and work with it in other softwares. To do this, I need to be able to track which event (or row in a spreadsheet format) is which.

What I do is that I 1 export a FCS file from FlowJo with the right amount of events (let us call it file x). 2 re-import file x into FlowJo, and export the data to a csv file. I can iterate this multiple times with file x, and it always gives me the same order of the events, based on the time they were aquired. 3 File x is then used for the citrus analysis, and I leave the variables "event number" and "time" out of transformation and scaling. 4 When I am done with the citrus analysis I export the main cluster and the clusters of interest. Here my issues start:

First: if I import these files into r, I can see that they all have the max value of 1024, regardless of the original spread and if I have checked the scaling and or transformation box in the UI or not (this is not true for the new parameter fileEventNumber, that sadly does not provide unique integers, but instead non-unique values with multiple decimals). This means that the data generated at this stage cannot be used for further analysis, as many of the parameters "hit the roof".

Second: if I import the main cluster file, containing all events, into flowJo and export it as a csv file, the order of the events has changed, as can be judged by the "time" parameter. If i try to sort on one or even all parameters in the file, I cannot recreate the original order, or at least the values are so converted that I cannot judge with correlation plots if they are ordered correctly. If I only consider the "time" parameter, I get a perfect correlation, meaning that it is the same number of events in each time frame, pointing in the direction that the data is the same, somehow. This does not help me however, as it does not give me single-event resolution.

Very thankful for all input on these problems. Tell me if you want me to split it up into multiple smaller issues. Jakob