Loading shows different behavioural on different datasets

Sins-code commented 3 years ago

Hello @all, im doing scientific research about coil compression possibilities. To be able to do everything in just one programming language, i decided to use this module instead of importing some matlab-functionalities to python. While working i found an error i can't really explain to myself: I work with to different .dat MRI-files, 1) with dimensions (256, 20, 261, 1, 20, 1, 1, 1, 3, 1, 9, 1, 1, 1, 1, 1) and 2) with dimensions (256, 20,208, 1, 44, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1) after removing oversampling. Both times i want to load the whole dataset in the beginning with data = twixObj.image[''] . But this works just for dataset 1. If i use this on the second dataset, i get this error-message:

line 23, in data = twixObj.image[''] line 646, in getitem out = self.readData(mem, ixToTarg, ixToRaw, selRange, selRangeSz, outSize) line 748, in readData fid.seek(mem[k] + szScanHeader, 0) OSError: [Errno 22] Invalid argument

If i use for example this command data= twixObj.image[:,:,:,:,20] , the loading works...but i can never load the whole data-set 2! Any ideas why this happens? If you want, i can try to share the dataset which does not work well with the module on some platform, so you can reproduce the error. Many thanks

wtclarke commented 3 years ago

Hi @Sins-code Are there any odd characters in the filename of the file you pass in for the second scan. In pdb can you step through and see where the errors occur and what the values of mem[k] + szScanHeader are?

Sins-code commented 3 years ago

Greetings @wexeee, im glad you found the time to answer my question, many thanks! The Filenames i'm passing are for example: meas_MID00102_FID31136_t2_tse11_tra_256_4mm.dat -> works and meas_MID00040_FID08032_REF_EPI1_256x244x44_TE6_TR580.dat -> does not work! I can't see any odd characters at first glance! I will now debug the loading and post values for mem and ScanHeader!

Sins-code commented 3 years ago

The error happens right after executing twixObj.image[''], when calling fid.seek(mem[k] + scanHeader, 0) in the method readData. Values are: k=0, kmax=27456, mem[0] has the value: -2147439168 and szScanHeader has value 192.

wtclarke commented 3 years ago

I think that mem[0] being < 0 doesn't make sense, it is trying to set the position to before the start of the file. Can you trace that back through the code to see why that occurs?

wtclarke commented 3 years ago

Perhaps see if it is an artefact of this line: https://github.com/wexeee/pymapvbvd/blob/3300290f3d626dc4c9305e326c0ea453ff5c90f6/mapvbvd/twix_map_obj.py#L681 where there is some casting.

Sins-code commented 3 years ago

I checked getitem right now:

1 mem = self.memPos[ixToRaw] 2 # sort mem for quicker access, sort cIxToTarg/Raw accordingly 3 ix = np.argsort(mem) 4 mem = mem[ix] 5 ixToTarg = ixToTarg[ix] 6 ixToRaw = ixToRaw[ix] 7 # import pdb; pdb.set_trace() 8 out = self.readData(mem, ixToTarg, ixToRaw, selRange, selRangeSz, outSize)

This is the first time readData is called! But already in line 1 there are some big negative values stored quite at the end of the array mem, which get to the front after being sorted in in line 3.

I will check memPos now!

wtclarke commented 3 years ago

Thanks. Yes, that is calculated via the filePos variable set here. https://github.com/wexeee/pymapvbvd/blob/3300290f3d626dc4c9305e326c0ea453ff5c90f6/mapvbvd/mapVBVD.py#L353

Sins-code commented 3 years ago

The error happens somewhere in here: After this while loop the are some negative values in the filePos-array! https://github.com/wexeee/pymapvbvd/blob/3300290f3d626dc4c9305e326c0ea453ff5c90f6/mapvbvd/mapVBVD.py#L74-L154

The negative values start to appear in iteration 25942, before that everything is normal. Max positive value is in iteration 25941 with value 2147445376.00, then everything gets negative, value in iteration 25942 is -2147439168.00! It seems to me like a numpy int32 overflow! This would also explain, why the error only occurs in some cases! The file, which produces the error, is nearly twice as big as the other one-2,3GB!

Sins-code commented 3 years ago

@wexeee What is your opinion on my explanation for the problem?

wtclarke commented 3 years ago

That sound possible (hence my incorrect guess with the explicit cast above). What type is being used here? I thought python was fairly good at handling things like this.

I've also had issues with (matlab) mapVBVD before where mdh loop parameters were centred around zero rather than just incrementing from zero. That isn't happening here is it?

Sins-code commented 3 years ago

Seems like I was on the wrong path...filePos has dtype float64, which is big enough ^^ i should have checked that first. I'm looking for other errors now! To your second question: I don't know, i try to check this as well.

Sins-code commented 3 years ago

Seems like i found the error now! My debugger tells me that the variable cpos, which is used to fill filePos, is from type int32!

wtclarke commented 3 years ago

So fid.tell() returns an int32?

What OS are you on?

Sins-code commented 3 years ago

Yeah exactly. Windows 10! Using Python 3.8.10.

wtclarke commented 3 years ago

And does the negative number come from fid.tell() or some calculation applied to the cpos variable?

Sins-code commented 3 years ago

Im trying to find out, running some more tests, give me a second!

Sins-code commented 3 years ago

It happens in between, directly in the first iteration! cpos starts as an (unlimited?) int, then becomes an int32 exactly here, at the end of the while loop: https://github.com/wexeee/pymapvbvd/blob/3300290f3d626dc4c9305e326c0ea453ff5c90f6/mapvbvd/mapVBVD.py#L154

wtclarke commented 3 years ago

Ah, probably something to do with the numpy types of the vairables feeding into ulDMALength. Does cPos = cPos + int(ulDMALength) fix the issue?

Sins-code commented 3 years ago

Yes it does! 👍

wtclarke commented 3 years ago

Great. Could you make a PR with this fix in? I can then check out some of the tests with it.

Sins-code commented 3 years ago

I still have one general question: Why is the loading process sometimes so much slower, depending on the datasets?

For example here the loading goes super fast:

pymapVBVD version 0.4.1 Software version: VD Scan 1/1, read all mdhs: 98%|█████████▊| 1.18G/1.21G [00:01<00:00, 1.27GB/s] Software: vd Number of acquisitions read 15660 Data size is [512, 20,261, 1, 20, 1, 1, 1, 3, 1, 9, 1, 1, 1, 1, 1] Squeezed data size is [256,20,261,20,3,9] (['Col', 'Cha', 'Lin', 'Sli', 'Rep', 'Seg']) NCol = 512 NCha = 20 NLin = 261 NAve = 1 NSli = 20 NPar = 1 NEco = 1 NPhs = 1 NRep = 3 NSet = 1 NSeg = 9 NIda = 1 NIdb = 1 NIdc = 1 NIdd = 1 NIde = 1

read data: 0%| | 0/15660 [00:00<?, ?it/s] read data: 0%| | 62/15660 [00:00<00:27, 568.67it/s] read data: 2%|▏ | 254/15660 [00:00<00:18, 836.64it/s] read data: 2%|▏ | 382/15660 [00:00<00:17, 892.67it/s] read data: 3%|▎ | 510/15660 [00:00<00:15, 977.49it/s] read data: 4%|▍ | 625/15660 [00:00<00:14, 1028.39it/s] read data: 5%|▍ | 730/15660 [00:00<00:14, 1004.91it/s] read data: 5%|▌ | 832/15660 [00:00<00:16, 878.54it/s] read data: 6%|▌ | 958/15660 [00:01<00:17, 854.85it/s] read data: 7%|▋ | 1086/15660 [00:01<00:16, 889.76it/s] read data: 8%|▊ | 1214/15660 [00:01<00:15, 916.03it/s] read data: 9%|▊ | 1340/15660 [00:01<00:14, 1001.08it/s] read data: 9%|▉ | 1470/15660 [00:01<00:14, 961.09it/s] read data: 10%|█ | 1626/15660 [00:01<00:12, 1110.39it/s]

Whereas here it goes really slow:

pymapVBVD version 0.4.1 Software version: VD Scan 1/1, read all mdhs: 95%|█████████▌| 2.02G/2.12G [00:01<00:00, 1.25GB/s] Software: vd Number of acquisitions read 27456 Data size is [512, 20,208, 1, 44, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1] Squeezed data size is [256,20,208,44,3] (['Col', 'Cha', 'Lin', 'Sli', 'Rep']) NCol = 512 NCha = 20 NLin = 208 NAve = 1 NSli = 44 NPar = 1 NEco = 1 NPhs = 1 NRep = 3 NSet = 1 NSeg = 1 NIda = 1 NIdb = 1 NIdc = 1 NIdd = 1 NIde = 1

Do you have an explanation for this? :D

Sins-code commented 3 years ago

Great. Could you make a PR with this fix in? I can then check out some of the tests with it.

Of course I can. But I will have to learn how to do it first, since I'm new to github!

wtclarke commented 3 years ago

I have now merged this fix.

wtclarke / pymapvbvd

Loading shows different behavioural on different datasets #6