scratchfoundation / scratch-flash

Open source version of the Scratch 2.0 project editor. This is the basis for the online and offline versions of Scratch found on the website.
https://scratch.mit.edu
GNU General Public License v2.0
1.33k stars 512 forks source link

Project Format - Unreliable saving project with long list #587

Closed TheLogFather closed 5 years ago

TheLogFather commented 9 years ago

I've had this issue for quite a while now, so decided it should be noted here...

Once a list has more than a couple of hundred thousand items, project saving becomes unreliable - it quite often never stops saying "Saving...", and you have to give up. On reloading the project, it hasn't saved.

This problem often means remixes of a project with a large list end up getting corrupted, because if you stop (i.e. go to a different page) in the middle of it first saving the remix (which you have to if it never stops "Saving...")., then it fails to save anything coherent.

I don't see how to avoid problems if someone breaks off in the middle of it saving the remix, but the issue is really that it never seems to finish... (Note that saving long variables doesn't seem to cause the same trouble.)

A couple of example projects showing this... Take a look at some of the remixes of http://scratch.mit.edu/projects/25217786/ (In particular, see project IDs 22699642 41453368 41452768 34809872) And nearly all the remixes of http://scratch.mit.edu/projects/17679303/ (Note that, in order to avoid this problem when I saved the original project, I encoded all the list data into variables. It then extracts the info into the lists when it starts the project - that's what those progress bars are when you first run it.)

TheLogFather commented 9 years ago

This could be related: http://scratch.mit.edu/discuss/topic/82460/

sayamindu commented 9 years ago

This may be one reason to compress the JSON when we send it across to the server.

sayamindu commented 9 years ago

Actually, this is a corner case that isn't handled by the project creation refactor that's going out this release. We are getting a truncated JSON file, which means the validation will fail, and the JSON will not be written to our file-store. However, before validation, we create the project id in the database so we will have the project, so the project-load failure dialog will continue to show up. This would also explain some of the errors we are getting from our file-upload server for invalid JSON.

(also, thanks @TheLogFather for pointing this out - this is super helpful)

TheLogFather commented 9 years ago

Just to note, if you haven't read the topic I linked above...

It looks like list info is getting placed in the JSON twice - appearing as part of the stage, as well as whatever else owns it...

That's hardly gonna help the upload issue for long lists!

griffpatch commented 9 years ago

Yeah... I reported that 10 months ago ;) On 19 Jan 2015 15:43, "Aka DadOfMrLog" notifications@github.com wrote:

Just to note, if you haven't read the topic I linked above...

It looks like list info is getting placed in the JSON twice - appearing as part of the stage, as well as whatever else owns it...

That's hardly gonna help the upload issue for long lists!

— Reply to this email directly or view it on GitHub https://github.com/LLK/scratch-flash/issues/587#issuecomment-70514003.

2jour commented 9 years ago

@sayamindu do we need to put fix before release?

sayamindu commented 9 years ago

@2jour Not for this one - it will require a bit of planning.

However, please prioritize the bug where lists are stored twice in the project data (the one that @griffpatch originally reported) - we should aim to get that fixed for next release.

2jour commented 9 years ago

Thanks @TheLogFather for reporting this issue.

We have created a separate issue for lists being stored twice in JSON https://github.com/LLK/scratch-flash/issues/591

kaschm commented 9 years ago

Looks like this is a larger piece of work involving compressing the JSON. We'll go ahead and fix https://github.com/LLK/scratch-flash/issues/591 in the meantime.

MegaApuTurkUltra commented 9 years ago

I think I've posted about this before on the forums, asking for JSON compression. My "ApuBeaTS Music Showcase" project had a list of tens of thousands of items, and saving was pretty unreliable.

PolyEdge commented 9 years ago

I wonder if the one in "children" is for the stage to use to display, and the one in the sprite is for the sprite to do computations with. So you could wreck scratch by getting those lists out of sync...

GarethPW commented 8 years ago

(Note that saving long variables doesn't seem to cause the same trouble.)

Actually, I've found that it does. I have an 11.2 MB project with two variables of size ~5000000. The project refuses to save with these variables present. I've tried saving the project without the variables and then updating the project's JSON via http://projects.scratch.mit.edu/internalapi/project/xxxxxxxxx/set/ however this results in an HTTP 413 error due to the size of the payload (even though it should be smaller than 50 MB).

TheLogFather commented 8 years ago

Yes, I've noticed that once you get to ~10MBytes of JSON then it fails to save. That's why I had to clip some of the frames from my 'Star Wars special' VidPlayer project (which has a single long variable to store the frame data).

I'd assumed that 10MB was meant to be the limit, but you say it's 50MB...? Maybe it's 50 for the whole project, but there's also a limit of 10 for the JSON? Can a ST dev confirm/deny this was the actual intention?

Of course, the problem is particularly acute with lists, since the double-saving (#591) means you hit the 10MB limit way sooner... :/

GarethPW commented 8 years ago

@TheLogFather My project was actually based on your Star Wars project. I was trying to import a music video, along with sound, into it but I ran into the same problem that you clearly faced. This is why I tried uploading via the API; I figured that I could upload the JSON without the unnecessary duplicate taking up space.

If the limit is 10MB of JSON, then that would help make more sense of this. In addition, perhaps it would be logical to increase that limit but keep the 50 MB total cap.

PolyEdge commented 8 years ago

... however this results in an HTTP 413 error due to the size of the payload (even though it should be smaller than 50 MB).

The rule is 50MB per project, 10MB per asset. The asset rule is not a good way of stopping spam-upload because when you create an asset, you do not really own it, so you can't trace an asset back to a project :-1:

Plus the 50MB limit is client-side because the server does not have the time and resources to scan your project and see how much space assets are taking. Scratch is basically providing a free JSON storage API.