[general need] Handling BIDS structure

brosscle commented 5 years ago

BIDS (Brain Imaging Data Structure : http://bids.neuroimaging.io) is a standard for organization of neuroimagning data. The main format is the nifti + json format. The nifti contains the image while the json contains the related metadata. New neuroinformatic tools can usually import a BIDS structured folder and deal with it. Some also deal with the export of BIDS structured folder.

I think handling this convention will allow POPULSE to be easily used by a large amount of researchers that also use other tools that deal with BIDS data.

servoz commented 5 years ago

@brosscle Thank you for your comment. Indeed, It will be a very good idea to implement in MIA ( and more generally in the populse project) the stuffs to deal with BIDS data. It should be an enhancement to do ASAP. Because, currently, we only have limited forces on the populse project, feel free to do PRs in this direction! Welcome to the machine ...

montigno commented 2 years ago

pyBids (https://bids-standard.github.io/pybids/) seems to be a good tool to deal with BIDS data.

How do you want to import the BIDS data into 'Data Browser' ? By adding an 'import BIDS data' function in 'File' or in the popup menu of Data Browser ?

With the 'Add document' function of the same popup menu, you can easily introduce * .nii.gz files. So it should be pretty easy (I hope so !) to work with BIDS files.

servoz commented 2 years ago

According to me there are 2 main works for this ticket:

Importing the data into mia
- For this there are currently 2 ways
  - File > Import
  - We currently use mri_conv and this way allows the conversion in the format used by mia and (especially) the automatic getting of the tags. This is in my opinion the most interesting way to import data into mia because the power of mia is the ubiquitous nature of the database and thus the access to meta data.
  - Add path button under the table and self.action_add_scan = self.menu.addAction("Add document"). The latter two methods give the same result, i.e. the import of the raw data without the associated tags.
- I think we should keep both ways of importing the data, but the first one is essential as it gives access to the meta data.
The operation under the hood of mia in accordance with the BIDS format.
- Currently everything is put in big bags in projects_mia/project_name/data/derived_data, projects_mia/project_name/data/downloaded_data and projects_mia/project_name/data/raw_data. projects_mia/project_name/data/raw_data is for data from File > Import. projects_mia/project_name/data/downloaded_data is for data coming from the second way of importing (without tags). projects_mia/project_name/data/derived_data is for data after calculations in mia. Currently the internal machinery of mia knows how to deal with these bags for i/o. The big challenge of this ticket is certainly on this side, changing the internal machinery of mia to work with the BIDS format.

In short, I think we will have to be able to add the raw data with File > Import but also with the other way of importing (Add document). The bulk of the work will certainly be more on the side of the functioning under the hood of mia to be able to manage the BIDS.

montigno commented 2 years ago

Ok, I start to understand how populse should manage BIDS data (by importing or adding document).

I have to change a lot of things on the side of mri_conv.

A question : should we import everything in the BIDS database ? like files containing extensions pklz, js, rst (examples among others !)

servoz commented 2 years ago

Excuse me, but I don't understand exactly.

A question : should we import everything in the BIDS database ? like files containing extensions pklz, js, rst (examples among others !)

If BIDS database is the mia database (displayed in the mia data browser): I'm not sure if we should import everything. For example currently we import nii and json, it's better to have too much than too little, but as the json data normally matches the tags in the mia database, I think we could do without it. Then it is difficult to say exactly if all the data will be needed. For example, unless I am mistaken, the .pklz file is a compressed serialised pickle file and it is a general format that can be used and generated by some applications, for example nipype. So I don't really know if all the .pklz files should be always imported...

It is therefore quite difficult to say a priori if everything must be imported, I would say that it depends on the file in question and if its important for mia in terms of metadata (the tags for a document but also the additional data necessary for the calculation like the bvals/bvecs for a DTI data) ...

montigno commented 2 years ago

ok, we start with the simple things and then we will adapt according to the needs.

montigno commented 2 years ago

After discussion with @servoz (Eric) and @urlub (Paul), it seems easier to operate as follows :

we open the BIDS datas with mri_conv and we export them as NifTI / Json to populse_MIA
we continue to work with the Nifti / Json (DataBrowser, Pipeline Manager...)
the user will have the possibility to export its processed data to BIDS since MIA (where to put the 'export Data to BIDS' menu ?)

The difficulty will be not to lose too much metadata when converting BIDs to Nifti/Json with mri_conv.

What do you think @brosscle ?

brosscle commented 2 years ago

Hi ! I think that it is the simplest way to support BIDS without modifying the core of the software and how MIA manage its database. As a first version I guess it is enough for users to deal with BIDS databases, so it's perfect :)

As you said the tough point is the evolutions of the standard and the conventions relatives to metadata inside the BIDS community. Maybe it is possible to restrain the handling of BIDS datasets to some modalities with matured standard (ie. no CT...?).

Eventually, you said that "the user will have the possibility to export its processed data to BIDS since MIA". I think it will be useful to allow the export of the full project, containing both raw and processed data.

I think it is a real benefit for MIA, congratulations ! Clément

servoz commented 2 years ago

After thorough examination of this ticket it seems that the optimum solution is to change nothing to the current mia machinery (nothing is impossible but it would require a lot of work, we think). The idea is to work on how to transfer data from any format (including BIDS) into mia and how to export data from mia to BIDS.

So in a nutshell : formatMIA

* the user will have the possibility to export its processed data to BIDS since MIA (where to put the 'export Data to BIDS' menu ?)
I think we could include the 'export Data to BIDS' just below the import action in the File menu:

Screenshot from 2021-11-17 15-30-59

montigno commented 2 years ago

Thank you @brosscle !

Yes, the user will have the possibility to export the full project, containing both raw and processed data.

Ok @servoz for menu location.

montigno commented 2 years ago

to @servoz :

I pushed on the branch 'issue#58', the modification i made. I added 'Export to BIDS' in the File menu, for the moment it only displays the databrowser list in the terminal.

I used 'self.project.session.get_documents_names (COLLECTION_CURRENT)'

It's correct ?

servoz commented 2 years ago

Great !!! I just did a test with data from a mia calculation. We can see, in the stdout, the list result showing the whole documents in the database in use in mia, with data in the raw_data and derived_data directories!

export to BIDS (11 files)
['data/derived_data/alej170316-IRMFonct.+perfusion-2016-03-17083444-0-T13DSENSE-T1TFE-000425.000_seg8.mat', 'data/derived_data/c1alej170316-IRMFonct.+perfusion-2016-03-17083444-0-T13DSENSE-T1TFE-000425.000.nii', 'data/derived_data/c2alej170316-IRMFonct.+perfusion-2016-03-17083444-0-T13DSENSE-T1TFE-000425.000.nii', 'data/derived_data/c3alej170316-IRMFonct.+perfusion-2016-03-17083444-0-T13DSENSE-T1TFE-000425.000.nii', 'data/derived_data/c4alej170316-IRMFonct.+perfusion-2016-03-17083444-0-T13DSENSE-T1TFE-000425.000.nii', 'data/derived_data/c5alej170316-IRMFonct.+perfusion-2016-03-17083444-0-T13DSENSE-T1TFE-000425.000.nii', 'data/derived_data/c6alej170316-IRMFonct.+perfusion-2016-03-17083444-0-T13DSENSE-T1TFE-000425.000.nii', 'data/derived_data/malej170316-IRMFonct.+perfusion-2016-03-17083444-0-T13DSENSE-T1TFE-000425.000.nii', 'data/derived_data/y_alej170316-IRMFonct.+perfusion-2016-03-17083444-0-T13DSENSE-T1TFE-000425.000.nii', 'data/raw_data/alej170316-IRMFonct.+perfusion-2016-03-17083444-0-T13DSENSE-T1TFE-000425.000.nii', 'data/raw_data/alej170316-IRMFonct.+perfusion-2016-03-17083444-1-BOLD_CVR_7_53sl_ModeratePNSSENSE-FEEPI-001212.000.nii']

From here you can access the tags you need by querying the database (as you did with get_documents_names()).

This will be much better than parsing json files, which are no longer up to date anyway as soon as the user adds/changes the tags or when pipeline are launched!

Take a look at populse_db or directly in populse_mia to see how we can query the database. Maybe in populse_mia it's easier because you can associate, with a little introspection, the command to the result, which is maybe easier than looking at populse_db which can seem a bit abstract.

montigno commented 2 years ago

I pushed a new version on the branch 'issue#58', now I can retrieve the tags of each data.

I added two files : export_bids.py & Modalities_BIDS.yml in 'populse_mia/python/populse_mia/data_manager'

This is not a problem ?

montigno commented 2 years ago

I forgot to specify one thing : pybids must be installed (pip3 install pybids)

servoz commented 2 years ago

Cool! Keep going!! Currently, the export_bids modules are not yet connected.

We may have to see where to place the modules and add pybids to the mia requirements. But when that's all that's left to do, the hard part will be done.

I think for the moment, it's better to do quick and dirty to see ASAP concretely a result and the issues encountered.

montigno commented 2 years ago

The export to Bids seems to be worked (for me) and only for raw data for instant, it remains to complete the json files.

How to retrieve all the tags and values of a data? We can loop on 'self.project.session.get_value' but I think there is a way to recover the dictionary faster.

Thank you !

servoz commented 2 years ago

You maybe want to use get_fields_names() and get_value() from populse_db?

I have just tested several times with different data and each time mia crashes during Export to BIDS. Did you try with these kind of data, obviously after import from mri_conv in mia?

servoz commented 2 years ago

ooops sorry, I do too many things at the same time (from one I start to have trouble !) and I read diagonally ... I think you want to use get_documents() instead. The returned object is not really a dictionary, but it looks like one! Otherwise the one who knows best populse_db between us is @sapetnioc I think ...

montigno commented 2 years ago

to @servoz, now it should work with your data (I hope), only for raw data for instant.

There is still a lot to do for the export to work well, it will be necessary to gradually complete the BIDS modalities dictionary (in Modalities_BIDS.yml file).

I searched how to get the dictionary of tags/values for each data. In the 'database.py' file of populse_db, on line number 255, we find all the methods to analyze databases, and there does not seem to be a method to retrieve all the tags at once from a data (I read diagonally because I'm lazy). I think I will loop on get_value().

servoz commented 2 years ago

OK, it works on the data. What is still blocking for the derived_data and downloaded_data?
The operation is quite long, but I think it's because the nii files are compressed in gzip format.
I note that your branch is strange. I think you have done some merge master with unresolved conflicts. Please do not merge your branch into master until you are sure that your branch is not corrupted anymore (the fastest way would be to start from a clean branch and add only what you did. Try to do it quickly before there is too much to add).
You need a way to get a dictionary with all the tags for a document? As I wrote above, in my opinion the easiest method to use in this case is get_documents(). It returns a list of objects that are not real dictionaries (in the isinstance() sense) but you can work with them almost like a dictionary. You should play with an object returned by get_documents() in mia (just do ack get_documents in a shell from populse_mia/python/populse_mia to see all occurrences - maybe you need to install ack before?), you will see that it is very close to a dictionary. @sapetnioc do you validate ?

servoz commented 2 years ago

@brosscle, please can you test the issue#58 branch with your data (only after File > Import, no calculation) in order to orientate the work from the beginning and not to start in a wrong way.

montigno commented 2 years ago

OK, it works on the data. What is still blocking for the derived_data and downloaded_data?

The operation is quite long, but I think it's because the nii files are compressed in gzip format.

I note that your branch is strange. I think you have done some merge master with unresolved conflicts. Please do not merge your branch into master until you are sure that your branch is not corrupted anymore (the fastest way would be to start from a clean branch and add only what you did. Try to do it quickly before there is too much to add).

You need a way to get a dictionary with all the tags for a document? As I wrote above, in my opinion the easiest method to use in this case is get_documents(). It returns a list of objects that are not real dictionaries (in the isinstance() sense) but you can work with them almost like a dictionary. You should play with an object returned by get_documents() in mia (just do ack get_documents in a shell from populse_mia/python/populse_mia to see all occurrences - maybe you need to install ack before?), you will see that it is very close to a dictionary. @sapetnioc do you validate ?

For instant, the export works for raw data, but for others I think it will be quick. The operation is indeed quite long because of the compression in gzip format.

I had problems with 'git pull' (in the branch issue#58), the message invited me to do commit then git pull redo the commit and git pull, and this without stopping. Finally I done git stash, but I had to put my modified codes manually in github (via Firefox) and redo git pull. Now it seems to work. I do not merge with master.

For the tags, after exchanging our emails, we are waiting for @sapetnioc's response. In the meantime I will put 'dirty' codes.

Thank you !

servoz commented 2 years ago

The branch is strange, for example see https://github.com/populse/populse_mia/commit/9b2abc33be3362967c008e67145bdbcba062d8a2 for example:

<<<<<<< HEAD
from populse_mia.data_manager.export_bids import ExportToBIDS
=======

import threading
import sys
>>>>>>> 7d8de90b63e7b7fe50fc1d5c57e67880a3664873

Indicates that there is an unresolved conflict after a merge. For me, when I see this, it is hard to have confidence in the whole branch. To be sure that all is well, I will tend to start from a clean branch (start from master and create a new branch) and add only the changes for the work on BIDS. This is not the most elegant method, but the fastest when there are not too many changes yet (I think you worked on 2 or 3 modules for now).

montigno commented 2 years ago

Maybe i need to create a new clean branch?

Thanks to @sapetnioc's explanations, now I can easily save the dictionaries in the json files.

There is still a lot to do in the associations of MRI sequences with BIDS modalities.

montigno commented 2 years ago

Just to let you know that the development of the BIDS converter is more complicated than expected. For some BIDS entities, it is necessary to enter the values manually, for example 'task' for functional MRIs, 'acq' for the same type of sequences but with different parameters (lowResolution or highResolution), and many others ... After a small meeting with @servoz , the idea would be to propose an interface (table) which will open after the action 'convert to Bids', with mandatory fields highlighted (colors ...). I will also make changes in mri_conv. In the basket tab, I will change the list (the bottom) to a table in order to complete some fields and save the associated tags in the json file.

To be continued...

servoz commented 2 years ago

@brosscle, please can you test the issue#58 branch with your data (only after File > Import, no calculation) in order to orientate the work from the beginning and not to start in a wrong way.

I think we lost @brosscle ....

brosscle commented 2 years ago

Hi ! Oh I haven't seen this identification before, sorry ! Actually, I posted this issue 3 years ago as an improvement that from my point of view would benefit a lot to Populse. It was just about handling the BIDS structure because I was seeing that this architecture convention was going to take the lead in neuroimaging research studies. Now I don't really work with Populse anymore, and I don't have a BIDS organized dataset as defined in https://bids.neuroimaging.io because I work with CT-scans and CT is not yet clearly define in the convention.

There are a lot of BIDS organized datasets publicly available on https://openneuro.org/search , so I guess that the easier way to test your development on MIA would just be to load one of those datasets, probably an MRI one.

servoz commented 1 year ago

If I have some time one of these days, I'll take care of this ticket. In the meantime, given the lack of work done so far on the subject, it's best to close it!

populse / populse_mia

[general need] Handling BIDS structure #58