jasp-stats / jasp-issues

This repository is solely meant for reporting of bugs, feature requests and other issues in JASP.
55 stars 29 forks source link

File size[Feature Request]: #2808

Open Franck6S opened 4 days ago

Franck6S commented 4 days ago

Description

JASP File size

Purpose

Usage & Storage of JASP files

Use-case

all JASP files

Is your feature request related to a problem?

My problem is the speed and usage of JASP, and related storage of information

Is your feature request related to a JASP module?

Unrelated

Describe the solution you would like

Long response calculation and size storage constraint on IT equipments

Describe alternatives that you have considered

testing with other software compare to JASP

Additional context

following image will show you related different file sizes with the same information : image You could see x4 to 6 file size difference with JASP vs Excel

and close to double Minitab vs JASP : image

tomtomme commented 3 days ago

Thx for the request. The .jasp files you point out only contain data and no analysis or graphs?

Franck6S commented 3 days ago

They contain data and SPC chart (Quality control module), but I have always the same situation (big size file) with other simple functions utilised : descriptives ANOVA, distributions.... here exemple with regression : data only in excel and JASP Linear regression : image FYI : Supporting JASP deployment, I get several "complaint" regarding JASP file sizes (storages sizes, sending files issues (limitation of size message...),...) which was not the case with previous software.

tomtomme commented 3 days ago

Currently you compare apples to oranges, so to say. A .jasp file can contain much more than the data like graphs, thus it will be bigger always. Sorry but there is nothing we can do about this.

To have a fair comparison of file sizes you would need to add graphs of same quality and analysis like regression in excel too and store it as xlsx.

If you compare a csv exported from jasp and a csv exported from excel the file size will be the same. And no, a csv is not an excel file. It is an ancient file format that existed even long before excel was born.

tomtomme commented 3 days ago

Regarding minitab. Does a minitab file also only contain the data? If it also stores analysis and graphs, are they of the same quality?

Franck6S commented 3 days ago

I agree with excel datafile and JASP file analysis, but problem stay the same while compariing JASP and Minitab for the exact same content , and you could see that we are at close to doubling the size. Be aware that this is a real issue.

Franck6S commented 3 days ago

example from this week : issue found : error due to space of disk. Main reason is when we generate several JASP Analysis (meaning JASP Files) we rapidly use a big space and we get error message or not (last week : JASP roaming without interuption for several minutes) : image

Let us know if you could consider the action in reducing JASP file size in the future.

boutinb commented 2 days ago

1577 GBytes is indeed huge!! But the JASP files you listed above were ~30 MBytes, this is totally another kind of order. So there are here 2 different things here. A JASP file is in fact a ZIP files containing the sqlite database, some description files, some png files if plots are used, and a state. What you can do is just to rename a JASP file with a zip suffix, and unzip it, then you can see its content, and check what takes most of the space (the json description files are zipped so they don't take so much space, but the database, png and state files are binaries). For the error with the 1577 GBytes, this seems to be something else. Can you tell us exactly how big is your data files (how many columns and rows), and what do you do precisely in the ANOVA analysis to get this error?

Franck6S commented 2 days ago

It is a different case, the person having this error message was trying to run an Anova, and it was missing space on his computer, and you have seen related error message (1577 GB...); the related file that I compute was image I suppose the error highlight by JASP was not relevant. Globally, the main reason is that JASP files are taking bigger space that we used to do, and it generates after some time space disk issues. and then we get those errors. My request is to consider file size being smaller for the future JASP upgrades

tomtomme commented 1 day ago

@Franck6S Again, your observation is totally correct. But as @boutinb explained, the .JASP file format is already a compressed .ZIP and there is little we can do about its size. JASP provides many features with this file format, maybe more than MINITAB. Downsizing the file format would probably mean cutting features. But that cannot be the goal here, correct? If you have suggestions how we could achive a smaller file size, I am all ears!

EJWagenmakers commented 1 day ago

I did notice that even for simple scenarios, JASP files can become relatively large. It would be good to study such cases by unzipping the file and assessing what components take up so much space. I suspect that the figures take up much space, but that can't be all. Some interesting detective work to be done here!

EJWagenmakers commented 1 day ago

...for instance, if an analysis uses MCMC or some other numerical method, it is probably the case that all of those samples are stored together with the output.