pdet / MultidimensionalAdaptiveIndexing

7 stars 1 forks source link

Big use of space #15

Open Dabendorf opened 1 year ago

Dabendorf commented 1 year ago

Hey, I just have a question about the disk use and what I need for it. I ran one of the experiments (the synthetic one) and I saw there are pairs of outputs, ending with -d and -w.

What is the difference between them? And do I need both of them to run the jupyter notebook? Is one of them potentially temporarily? image

Manerone commented 1 year ago

What is the difference between them?

Could you print the first 10 lines of a file ending in '-w' and one in -'d'? I believe the -d means that is a file containing data for the experiment, and -w containing the workload for the experiment.

And do I need both of them to run the jupyter notebook? Is one of them potentially temporarily?

If they are indeed data and workload files, then as far as I remember you don't need them to plot anything. But I would say, try to rename the folder where they are before deleting, just to be sure :)

Dabendorf commented 1 year ago

The output of d is gibberish, but the output of w is some data.

➜  data git:(master) ✗ head -10 alternating_zoom_in-70000000-2-0.01-d
�%L� K$g$Lu�nKг%LA��K@{}L�mrLV!7LL��K��K��$L���K�\rL{`;LhS�J`h�J�=�LD3LOm/Lr 4LX�7K�bKnd�Kj�
K}�IL q�K�A#L�`�K曮K5&L�!%J���K�\*L׻�L<ȀLD�JCU/L�h`K�*L�M-K!k/L~GL2�K5��K4~�K���K�L�K�w'Li�*K0��KX*�JZ;5L�H0Le�KTuKRQLJ3SK�
��KNL�,�L@noJ�t"L�Z�K�FL�ʫI@h(JFG�K�K���Lk"K$�K��K�)L��J��8LvЪK�|lL���K�vL�ՉJ��hK��9L��L�.L�[hL���K=L�                     Lh�K��1L��\LKiI���JSJ'LRaL8�4L�}�J��L:"�L��kL
K��-Lf@L��aL�v�K45�KG�DK���K�L{LpҬI�41K(�^LT��K�m�H�TK                                                L9vL���J�iwL���L蕲J��yLX/�K!�K�� I�B3L�>bL3�
�"L��K���I���K�K�T@L�P1L���K�қK��K,�%L;8IL|S�K��kLJ�KR�J<;�KX|�JDL�/�K޳La6LB�L|�J}q/Lv �L&/L�n�K޿�KF�mK��pL��Kor�K��JX#�K�&�LՏoL�ًK(�XL-=L�=L���K\�J:��J��vL@A�K��?LH�K�aL�� K��rL�+iL�h!Ju�.KNLPU%L A3K�K�>QLD�cLPhL��XL���K��L�aK��Kp�Jޞ�Jz�L�R;LB��K[��K���K��ALDfYJ��hLZV�K��LX�I@�eL��wK�\II�rML�_�K�}L{DLBqILed8K0R�K�
                                                                                                  L�|L��iJ��KO�VKX�@KP1�I��7K�-UL���Kl�pK��jL ��KΔ}LFyL�^#L�2=LzA LpJ�zRL��0KYMLI�&L`vXJ�   X�;L*�K"F�L��zLt��K��$L�e6L��L3#hL�qL\�IL��DL���L���K3�L���K�/dK�raK��nK�   HKkK.�}LZ�LբFL4��K�LKL�LHStK��6L���K�F�J�`yJ\trJ-\�KFL�}�K�lL�;LV �L���K��%L��@K��L`�Ir�eL.��Ja�GLlz6L؋L���K��5L�L�
                                  L7�(L�sE�bjLr��K�KP��K�XVL�mL�pGK�6L�L �K�8L��Lp�gK�SLUt~LրvK�LDLtplL�i�Kg��K�,eK���L/
�L�DJ�.�Lǵ`K�� �KT^�K�5L��J�s�J�5�I�=L��yL��I��3L�_Ld�RL�0�LW�K��<L�L.#CK
l�J��   LNoL��E�L��SL�kkL��"L��K��jKW"�L���K]�xKGlLJ�~L�7LD�|LT1@Lp�VL�LU)LO>L=�jL��L�g�K}�bL�dL+�PK��%L0�OLЉcI�ւLꆺK��gL(Kl�QL�ڃL�H�LN|L�fJL,�L,�H��+LQL���L���K    �L+�SL�}sL.��K��OLX��K���K\��K�C�J:#�K��Kf7PLӀyK�LΌKN�K��K��KJ�K�4�BL�oIpoDI�HLm�NL���K��K� mL�ުJ-��K,��J7�5Ld�4L�YLr΃K4$�K���KJL*�L��^L�"gL��K�FCLLu�J�6�Kؗ�K�B
           L̅7L"q�K��:LeeL�L�K���K��K�^L�BK�L��SL���J$MtJ�G1L�4;L�]Ka&QLߑcL��PL�7LP[�K��?K��K�J��L��&I���K��oL�6�K�cK�SK�HQL��Kc}xL�?J  @L;�VL=�L�[�J�4�K�Z
➜  data git:(master) ✗ head -10 alternating_zoom_in-70000000-2-0.01-w
0 0
7e+06 7e+06
0 1
6.3e+07 6.3e+07
7e+07 7e+07
0 1
3500 3500
6.9965e+06 6.9965e+06
0 1
6.30035e+07 6.30035e+07

But if there are indeed not necessary, where is the data which is needed to print the results in the jupyter notebook? I wanted to run all synthetic workloads, but half through, it generated 50 GB of data and I could not continue due to disk storage restrictions. Maybe this is an easy fix?

Anyway, thank you for actually answering, this is extremely useful.

pdet commented 1 year ago

Hey @Dabendorf, as @Manerone said these are just the data and workload. I guess the data we probably serialize directly (hence looks like gibberish) if you open it with a text editor. The reason we do this is to save time for the execution of the experiments. Since we just need to generate the files once (and had no space constraints).

You can modify the code to instead of store and read files, to generate them in memory, and then run whataver experiments on top of them, I don't think that would be too difficult, since it's mainly skipping the serialize/deserialize steps. :-)

Dabendorf commented 1 year ago

Makes sense to me. I will have a look what I can do. I just ran another experiment and it that one runs perfectly, since it doesnt need a lot of space. Thank you for your help :)