Comments - Githubissues

streetcomplete / sc-photo-service

Photo upload service for StreetComplete

MIT License

16 stars 5 forks source link

Comments #1

Closed westnordost closed 6 years ago

westnordost commented 6 years ago

Looks very good! Can't wait to see it in action!

A few comments. I'll just cluster this in one ticket now:

I am not sure why there needs to be JSON at all. Doesn't it make things more complicated?: upload.php - could simply return the url in plaintext or a normal HTTP error code + message on failure activate.php - could have a normal POST parameter for the note id and the reply is not interesting for the client (=will not be parsed), perhaps if there is an error. The error could be a normal HTTP error code + message Generally, it seems a little weird to get a HTTP OK 200 on an error
Posting raw data to upload the picture - is this normal? I thought it would be always multipart/form-data for file upload. Not sure if this changes anything.
file upload limit: I would very much like a configurable file upload limit. On my website, the maximum upload file size in PHP.ini is 128MB and I do not want to change the default there to something as low as would be a good setting for this service (a few MB)
Did you handle the case that a photo has already been moved from the tmp directory to the proper one? activate.php should in that case just return "OK", no error.
Did you handle the case that the to-be-activated new picture is in one of the note comments and earlier comments or the first post in the thread also had links which are now dead because the note had been redacted/closed in the meantime but reopened later? (activate.php should still return "OK" and activate just the new one, ignore the rest)
Photos of redacted notes should be cleaned up without considering MAX_LIFETIME_AFTER_NOTE_CLOSED_DAYS
How well does the service handle if I would go about deleting this and that image manually from the filesystem (without touching the DB)?
Why is the file extension saved to DB? (Because it is in the filesystem already). Hmm and actually, why is the id even saved to DB? - It's also in the name
For making the file name as short as possible, perhaps a-z could also be used to count-up the filename.
The service must be prepared to make all calls to the OSM Notes API with a logged in user, specifyable in the config. (because of GDPR / DSGVO the notes api will perhaps not be allowed to be used as anonymous)
What seems to be missing is that the cleanup script also starts deleting photos if the max directory size exceeds the configured limit (oldest first)
Finally, double check everything regarding not accepting parameters as-is from the user and preventing the possibility of a SQL injection?

exploide commented 6 years ago

Before answering to all the comments, let me transform this into a checklist, so I can keep track of it.

[x] 1. Why JSON; error handling
[x] 2. Posting raw data
[x] 3. File upload limit
[x] 4. Case that file has already been moved
[x] 5. Case that multiple photo URLs are in note
[x] 6. MAX_LIFETIME_AFTER_NOTE_CLOSED_DAYS is undesired; redacted notes
[x] 7. Robustness regarding manual file deletion
[x] 8. Why saving file id and extension in database
[x] 9. Making file name even shorter
[x] 10. Authenticated API calls
[x] 11. Max directory size
[x] 12. Checked against SQL injections and similar

exploide commented 6 years ago

Thanks, glad to hear that. Now let's talk about your comments.

I am not sure why there needs to be JSON at all. Doesn't it make things more complicated?: upload.php - could simply return the url in plaintext or a normal HTTP error code + message on failure activate.php - could have a normal POST parameter for the note id and the reply is not interesting for the client (=will not be parsed), perhaps if there is an error. The error could be a normal HTTP error code + message Generally, it seems a little weird to get a HTTP OK 200 on an error

No very strong reason why I picked JSON, I just liked it. I can also just return plain text but having a structured output is better prepared if we ever want to return some other data in the future. Also, I wanted it to be consistent across the API. Regarding the HTTP status code: That should be properly set in all cases. Where do you get a 200 even if there is an error? It should always return a meaningful code. If not I have made a mistake and would like to fix that. Back to the JSON: Do you really require changes here? I imagined it wouldn't be too complex to extract a value from a JSON.

Posting raw data to upload the picture - is this normal? I thought it would be always multipart/form-data for file upload. Not sure if this changes anything.

Nah, multipart/form-data is only for data consisting of multiple fields, including files, that are uploaded via an HTML form. It introduces a certain overhead and involves a weird encoding. Additionally, the way I'm doing it here allows to hold the data in memory and only write it to disk when accepted. If you download a file, the content type will specify what data to except, the raw data will follow. It's exactly the same for uploads (except that the content type header should not be trusted). An API typically has no use for multipart encoding.

file upload limit: I would very much like a configurable file upload limit. On my website, the maximum upload file size in PHP.ini is 128MB and I do not want to change the default there to something as low as would be a good setting for this service (a few MB)

All right, I will implement this.

Did you handle the case that a photo has already been moved from the tmp directory to the proper one? activate.php should in that case just return "OK", no error.

I hope you mean moved by a previous call to activate.php not by you fiddling with the file system :D In the desired case it finds that the photo was already activated and returns HTTP code 200 along with something like {'found_photos': 1, 'activated_photos': 0}.

Did you handle the case that the to-be-activated new picture is in one of the note comments and earlier comments or the first post in the thread also had links which are now dead because the note had been redacted/closed in the meantime but reopened later? (activate.php should still return "OK" and activate just the new one, ignore the rest)

Yes, it does that. Leading to an HTTP code 200 and something like {'found_photos': 2, 'activated_photos': 1}.

Photos of redacted notes should be cleaned up without considering MAX_LIFETIME_AFTER_NOTE_CLOSED_DAYS

Really? Since notes are often reactivated within the first few days after they got closed, I think the photos could still be useful. But that's up to you. However, do you really want to have this removed? I mean the feature is already there and you can just set the constant to 0. (Actually I had a bug in that, but it is fixed now.)

How well does the service handle if I would go about deleting this and that image manually from the filesystem (without touching the DB)?

activate.php would return HTTP 500 with {'error': 'Cannot move file'} and abort. cleanup.php would print something like PHP Warning: unlink(../photos_tmp/18.jpg): No such file or directory and go ahead. The corresponding database entry will only be deleted when it's due because of the currently implemented cleanup rules. Do you want that something different happens?

Why is the file extension saved to DB? (Because it is in the filesystem already). Hmm and actually, why is the id even saved to DB? - It's also in the name

Well, if I remove file_id and file_ext, the only columns left will be creation_time and note_id. Where is then the association between photo and note? Maybe it is possible to change everything in a way that the file name is solely extracted from the OSM note. But I think relying on information only available remotely is not so nice. Additionally, this would require more parsing and user-supplied input validation. I consider it easier and probably cleaner to do this how it currently happens. Is this a no-go for you?

For making the file name as short as possible, perhaps a-z could also be used to count-up the filename.

I have to say that I'm not excited about this. I thought about file name generation. At first I had 16 byte random values for collision free identification. But then I also had the opinion that short file names would be great. Therefore, I came up with just using the incremental primary key ID of the database entry. Changing this to also include other characters may be much more work to implement. The encoding would probably be needed to be done manually, if hex or base64 are not enough. Besides, the service works more or less based on the fact that the file has a numeric ID. I think several parts would need to change. And the current file names are not utterly long either. Is this change really important for you?

The service must be prepared to make all calls to the OSM Notes API with a logged in user, specifyable in the config. (because of GDPR / DSGVO the notes api will perhaps not be allowed to be used as anonymous)

Oh, yes. I will implement this.

What seems to be missing is that the cleanup script also starts deleting photos if the max directory size exceeds the configured limit (oldest first)

Right, as stated in the SC issue, this was deliberately omitted, because I was not sure if you really need this considering the amount of uploaded photos in the past. But since you seem to want that feature, I can implement it.

Finally, double check everything regarding not accepting parameters as-is from the user and preventing the possibility of a SQL injection?

Already did that. At least it is my job finding such vulnerabilities ;) But if I overlooked something, what certainly could happen, let me know immediately, so I can fix that.

That was much...

westnordost commented 6 years ago

Regarding 6: there is a difference between closed (=resolved) notes and redacted (=hidden) notes. That the latter exist is a relatively new features and enables moderators to hide/censor spam or otherwise inappropriate content by completely hiding them from non-moderators. I am not sure about the Http code that is returned when querying a note like this by id - if 404 or something else

Am 30. August 2018 19:38:14 MESZ schrieb Jannik Vieten notifications@github.com:

Thanks, glad to hear that. Now let's talk about your comments.

I am not sure why there needs to be JSON at all. Doesn't it make things more complicated?: upload.php - could simply return the url in plaintext or a normal HTTP error code + message on failure activate.php - could have a normal POST parameter for the note id and the reply is not interesting for the client (=will not be parsed), perhaps if there is an error. The error could be a normal HTTP error code + message Generally, it seems a little weird to get a HTTP OK 200 on an error

No very strong reason why I picked JSON, I just liked it. I can also just return plain text but having a structured output is better prepared if we ever want to return some other data in the future. Also, I wanted it to be consistent across the API. Regarding the HTTP status code: That should be properly set in all cases. Where do you get a 200 even if there is an error? It should always return a meaningful code. If not I have made a mistake and would like to fix that. Back to the JSON: Do you really require changes here? I imagined it wouldn't be too complex to extract a value from a JSON.

Posting raw data to upload the picture - is this normal? I thought it would be always multipart/form-data for file upload. Not sure if this changes anything.

Nah, multipart/form-data is only for data consisting of multiple fields, including files, that are uploaded via an HTML form. It introduces a certain overhead and involves a weird encoding. Additionally, the way I'm doing it here allows to hold the data in memory and only write it to disk when accepted. If you download a file, the content type will specify what data to except, the raw data will follow. It's exactly the same for uploads (except that the content type header should not be trusted). An API typically has no use for multipart encoding.

file upload limit: I would very much like a configurable file upload limit. On my website, the maximum upload file size in PHP.ini is 128MB and I do not want to change the default there to something as low as would be a good setting for this service (a few MB)

All right, I will implement this.

Did you handle the case that a photo has already been moved from the tmp directory to the proper one? activate.php should in that case just return "OK", no error.

I hope you mean moved by a previous call to activate.php not by you fiddling with the file system :D In the desired case it finds that the photo was already activated and returns HTTP code 200 along with something like {'found_photos': 1, 'activated_photos': 0}.

Did you handle the case that the to-be-activated new picture is in one of the note comments and earlier comments or the first post in the thread also had links which are now dead because the note had been redacted/closed in the meantime but reopened later? (activate.php should still return "OK" and activate just the new one, ignore the rest)

Yes, it does that. Leading to an HTTP code 200 and something like {'found_photos': 2, 'activated_photos': 1}.

Photos of redacted notes should be cleaned up without considering MAX_LIFETIME_AFTER_NOTE_CLOSED_DAYS

Really? Since notes are often reactivated within the first few days after they got closed, I think the photos could still be useful. But that's up to you. However, do you really want to have this removed? I mean the feature is already there and you can just set the constant to

(Actually I had in bug in that, but it is fixed now.)

How well does the service handle if I would go about deleting this and that image manually from the filesystem (without touching the DB)?

activate.php would return HTTP 500 with {'error': 'Cannot move file'} and abort. cleanup.php would print something like PHP Warning: unlink(../photos_tmp/18.jpg): No such file or directory and go ahead. The corresponding database entry will only be deleted when it's due because of the currently implemented cleanup rules. Do you want that something different happens?

Why is the file extension saved to DB? (Because it is in the filesystem already). Hmm and actually, why is the id even saved to DB?

It's also in the name

Well, if I remove file_id and file_ext, the only columns left will be creation_time and note_id. Where is then the association between photo and note? Maybe it is possible to change everything in a way that the file name is solely extracted from the OSM note. But I think relying on information only available remotely is not so nice. Additionally, this would require more parsing and user-supplied input validation. I consider it easier and probably cleaner to do this how it currently happens. Is this a no-go for you?

For making the file name as short as possible, perhaps a-z could also be used to count-up the filename.

I have to say that I'm not excited about this. I thought about file name generation. At first I had 32 byte random values for collision free identification. But then I also had the opinion that short file names would be great. Therefore, I came up with just using the incremental primary key ID of the database entry. Changing this to also include other characters may be much more work to implement. The encoding would probably be needed to be done manually, if hex or base64 are not enough. Besides, the service works more or less based on the fact that the file has a numeric ID. I think several parts would need to change. And the current file names are not utterly long either. Is this change really important for you?

The service must be prepared to make all calls to the OSM Notes API with a logged in user, specifyable in the config. (because of GDPR / DSGVO the notes api will perhaps not be allowed to be used as anonymous)

Oh, yes. I will implement this.

What seems to be missing is that the cleanup script also starts deleting photos if the max directory size exceeds the configured limit (oldest first)

Right, as stated in the SC issue, this was deliberately omitted, because I was not sure if you really need this considering the amount of uploaded photos in the past. But since you seem to want that feature, I can implement it.

Finally, double check everything regarding not accepting parameters as-is from the user and preventing the possibility of a SQL injection?

Already did that. At least it is my job finding such vulnerabilities ;) But if I overlooked something, what certainly could happen, let me know immediately, so I can fix that.

That was much...

westnordost commented 6 years ago

Regarding 7, it is just important that the service will not get into some kind of undefined or invalid state but keeps working as expected l. Noting missing pics as warnings is fine

Am 30. August 2018 19:38:14 MESZ schrieb Jannik Vieten notifications@github.com:

Thanks, glad to hear that. Now let's talk about your comments.

I am not sure why there needs to be JSON at all. Doesn't it make things more complicated?: upload.php - could simply return the url in plaintext or a normal HTTP error code + message on failure activate.php - could have a normal POST parameter for the note id and the reply is not interesting for the client (=will not be parsed), perhaps if there is an error. The error could be a normal HTTP error code + message Generally, it seems a little weird to get a HTTP OK 200 on an error

No very strong reason why I picked JSON, I just liked it. I can also just return plain text but having a structured output is better prepared if we ever want to return some other data in the future. Also, I wanted it to be consistent across the API. Regarding the HTTP status code: That should be properly set in all cases. Where do you get a 200 even if there is an error? It should always return a meaningful code. If not I have made a mistake and would like to fix that. Back to the JSON: Do you really require changes here? I imagined it wouldn't be too complex to extract a value from a JSON.

Posting raw data to upload the picture - is this normal? I thought it would be always multipart/form-data for file upload. Not sure if this changes anything.

Nah, multipart/form-data is only for data consisting of multiple fields, including files, that are uploaded via an HTML form. It introduces a certain overhead and involves a weird encoding. Additionally, the way I'm doing it here allows to hold the data in memory and only write it to disk when accepted. If you download a file, the content type will specify what data to except, the raw data will follow. It's exactly the same for uploads (except that the content type header should not be trusted). An API typically has no use for multipart encoding.

file upload limit: I would very much like a configurable file upload limit. On my website, the maximum upload file size in PHP.ini is 128MB and I do not want to change the default there to something as low as would be a good setting for this service (a few MB)

All right, I will implement this.

Did you handle the case that a photo has already been moved from the tmp directory to the proper one? activate.php should in that case just return "OK", no error.

I hope you mean moved by a previous call to activate.php not by you fiddling with the file system :D In the desired case it finds that the photo was already activated and returns HTTP code 200 along with something like {'found_photos': 1, 'activated_photos': 0}.

Did you handle the case that the to-be-activated new picture is in one of the note comments and earlier comments or the first post in the thread also had links which are now dead because the note had been redacted/closed in the meantime but reopened later? (activate.php should still return "OK" and activate just the new one, ignore the rest)

Yes, it does that. Leading to an HTTP code 200 and something like {'found_photos': 2, 'activated_photos': 1}.

Photos of redacted notes should be cleaned up without considering MAX_LIFETIME_AFTER_NOTE_CLOSED_DAYS

Really? Since notes are often reactivated within the first few days after they got closed, I think the photos could still be useful. But that's up to you. However, do you really want to have this removed? I mean the feature is already there and you can just set the constant to

(Actually I had in bug in that, but it is fixed now.)

How well does the service handle if I would go about deleting this and that image manually from the filesystem (without touching the DB)?

activate.php would return HTTP 500 with {'error': 'Cannot move file'} and abort. cleanup.php would print something like PHP Warning: unlink(../photos_tmp/18.jpg): No such file or directory and go ahead. The corresponding database entry will only be deleted when it's due because of the currently implemented cleanup rules. Do you want that something different happens?

Why is the file extension saved to DB? (Because it is in the filesystem already). Hmm and actually, why is the id even saved to DB?

It's also in the name

Well, if I remove file_id and file_ext, the only columns left will be creation_time and note_id. Where is then the association between photo and note? Maybe it is possible to change everything in a way that the file name is solely extracted from the OSM note. But I think relying on information only available remotely is not so nice. Additionally, this would require more parsing and user-supplied input validation. I consider it easier and probably cleaner to do this how it currently happens. Is this a no-go for you?

For making the file name as short as possible, perhaps a-z could also be used to count-up the filename.

I have to say that I'm not excited about this. I thought about file name generation. At first I had 32 byte random values for collision free identification. But then I also had the opinion that short file names would be great. Therefore, I came up with just using the incremental primary key ID of the database entry. Changing this to also include other characters may be much more work to implement. The encoding would probably be needed to be done manually, if hex or base64 are not enough. Besides, the service works more or less based on the fact that the file has a numeric ID. I think several parts would need to change. And the current file names are not utterly long either. Is this change really important for you?

The service must be prepared to make all calls to the OSM Notes API with a logged in user, specifyable in the config. (because of GDPR / DSGVO the notes api will perhaps not be allowed to be used as anonymous)

Oh, yes. I will implement this.

What seems to be missing is that the cleanup script also starts deleting photos if the max directory size exceeds the configured limit (oldest first)

Right, as stated in the SC issue, this was deliberately omitted, because I was not sure if you really need this considering the amount of uploaded photos in the past. But since you seem to want that feature, I can implement it.

Finally, double check everything regarding not accepting parameters as-is from the user and preventing the possibility of a SQL injection?

Already did that. At least it is my job finding such vulnerabilities ;) But if I overlooked something, what certainly could happen, let me know immediately, so I can fix that.

That was much...

westnordost commented 6 years ago

Regarding 8: no, just asking. Specifically I wondered about the file extension.

Am 30. August 2018 19:38:14 MESZ schrieb Jannik Vieten notifications@github.com:

Thanks, glad to hear that. Now let's talk about your comments.

I am not sure why there needs to be JSON at all. Doesn't it make things more complicated?: upload.php - could simply return the url in plaintext or a normal HTTP error code + message on failure activate.php - could have a normal POST parameter for the note id and the reply is not interesting for the client (=will not be parsed), perhaps if there is an error. The error could be a normal HTTP error code + message Generally, it seems a little weird to get a HTTP OK 200 on an error

No very strong reason why I picked JSON, I just liked it. I can also just return plain text but having a structured output is better prepared if we ever want to return some other data in the future. Also, I wanted it to be consistent across the API. Regarding the HTTP status code: That should be properly set in all cases. Where do you get a 200 even if there is an error? It should always return a meaningful code. If not I have made a mistake and would like to fix that. Back to the JSON: Do you really require changes here? I imagined it wouldn't be too complex to extract a value from a JSON.

Posting raw data to upload the picture - is this normal? I thought it would be always multipart/form-data for file upload. Not sure if this changes anything.

Nah, multipart/form-data is only for data consisting of multiple fields, including files, that are uploaded via an HTML form. It introduces a certain overhead and involves a weird encoding. Additionally, the way I'm doing it here allows to hold the data in memory and only write it to disk when accepted. If you download a file, the content type will specify what data to except, the raw data will follow. It's exactly the same for uploads (except that the content type header should not be trusted). An API typically has no use for multipart encoding.

file upload limit: I would very much like a configurable file upload limit. On my website, the maximum upload file size in PHP.ini is 128MB and I do not want to change the default there to something as low as would be a good setting for this service (a few MB)

All right, I will implement this.

Did you handle the case that a photo has already been moved from the tmp directory to the proper one? activate.php should in that case just return "OK", no error.

I hope you mean moved by a previous call to activate.php not by you fiddling with the file system :D In the desired case it finds that the photo was already activated and returns HTTP code 200 along with something like {'found_photos': 1, 'activated_photos': 0}.

Did you handle the case that the to-be-activated new picture is in one of the note comments and earlier comments or the first post in the thread also had links which are now dead because the note had been redacted/closed in the meantime but reopened later? (activate.php should still return "OK" and activate just the new one, ignore the rest)

Yes, it does that. Leading to an HTTP code 200 and something like {'found_photos': 2, 'activated_photos': 1}.

Photos of redacted notes should be cleaned up without considering MAX_LIFETIME_AFTER_NOTE_CLOSED_DAYS

Really? Since notes are often reactivated within the first few days after they got closed, I think the photos could still be useful. But that's up to you. However, do you really want to have this removed? I mean the feature is already there and you can just set the constant to

(Actually I had in bug in that, but it is fixed now.)

How well does the service handle if I would go about deleting this and that image manually from the filesystem (without touching the DB)?

activate.php would return HTTP 500 with {'error': 'Cannot move file'} and abort. cleanup.php would print something like PHP Warning: unlink(../photos_tmp/18.jpg): No such file or directory and go ahead. The corresponding database entry will only be deleted when it's due because of the currently implemented cleanup rules. Do you want that something different happens?

Why is the file extension saved to DB? (Because it is in the filesystem already). Hmm and actually, why is the id even saved to DB?

It's also in the name

Well, if I remove file_id and file_ext, the only columns left will be creation_time and note_id. Where is then the association between photo and note? Maybe it is possible to change everything in a way that the file name is solely extracted from the OSM note. But I think relying on information only available remotely is not so nice. Additionally, this would require more parsing and user-supplied input validation. I consider it easier and probably cleaner to do this how it currently happens. Is this a no-go for you?

For making the file name as short as possible, perhaps a-z could also be used to count-up the filename.

I have to say that I'm not excited about this. I thought about file name generation. At first I had 32 byte random values for collision free identification. But then I also had the opinion that short file names would be great. Therefore, I came up with just using the incremental primary key ID of the database entry. Changing this to also include other characters may be much more work to implement. The encoding would probably be needed to be done manually, if hex or base64 are not enough. Besides, the service works more or less based on the fact that the file has a numeric ID. I think several parts would need to change. And the current file names are not utterly long either. Is this change really important for you?

The service must be prepared to make all calls to the OSM Notes API with a logged in user, specifyable in the config. (because of GDPR / DSGVO the notes api will perhaps not be allowed to be used as anonymous)

Oh, yes. I will implement this.

What seems to be missing is that the cleanup script also starts deleting photos if the max directory size exceeds the configured limit (oldest first)

Right, as stated in the SC issue, this was deliberately omitted, because I was not sure if you really need this considering the amount of uploaded photos in the past. But since you seem to want that feature, I can implement it.

Finally, double check everything regarding not accepting parameters as-is from the user and preventing the possibility of a SQL injection?

Already did that. At least it is my job finding such vulnerabilities ;) But if I overlooked something, what certainly could happen, let me know immediately, so I can fix that.

That was much...

westnordost commented 6 years ago

Regarding 9: no, not that important. Though, the shorter the better, and, hex IS numerical, isn't it?

Am 30. August 2018 19:38:14 MESZ schrieb Jannik Vieten notifications@github.com:

Thanks, glad to hear that. Now let's talk about your comments.

I am not sure why there needs to be JSON at all. Doesn't it make things more complicated?: upload.php - could simply return the url in plaintext or a normal HTTP error code + message on failure activate.php - could have a normal POST parameter for the note id and the reply is not interesting for the client (=will not be parsed), perhaps if there is an error. The error could be a normal HTTP error code + message Generally, it seems a little weird to get a HTTP OK 200 on an error

No very strong reason why I picked JSON, I just liked it. I can also just return plain text but having a structured output is better prepared if we ever want to return some other data in the future. Also, I wanted it to be consistent across the API. Regarding the HTTP status code: That should be properly set in all cases. Where do you get a 200 even if there is an error? It should always return a meaningful code. If not I have made a mistake and would like to fix that. Back to the JSON: Do you really require changes here? I imagined it wouldn't be too complex to extract a value from a JSON.

Posting raw data to upload the picture - is this normal? I thought it would be always multipart/form-data for file upload. Not sure if this changes anything.

Nah, multipart/form-data is only for data consisting of multiple fields, including files, that are uploaded via an HTML form. It introduces a certain overhead and involves a weird encoding. Additionally, the way I'm doing it here allows to hold the data in memory and only write it to disk when accepted. If you download a file, the content type will specify what data to except, the raw data will follow. It's exactly the same for uploads (except that the content type header should not be trusted). An API typically has no use for multipart encoding.

file upload limit: I would very much like a configurable file upload limit. On my website, the maximum upload file size in PHP.ini is 128MB and I do not want to change the default there to something as low as would be a good setting for this service (a few MB)

All right, I will implement this.

Did you handle the case that a photo has already been moved from the tmp directory to the proper one? activate.php should in that case just return "OK", no error.

I hope you mean moved by a previous call to activate.php not by you fiddling with the file system :D In the desired case it finds that the photo was already activated and returns HTTP code 200 along with something like {'found_photos': 1, 'activated_photos': 0}.

Did you handle the case that the to-be-activated new picture is in one of the note comments and earlier comments or the first post in the thread also had links which are now dead because the note had been redacted/closed in the meantime but reopened later? (activate.php should still return "OK" and activate just the new one, ignore the rest)

Yes, it does that. Leading to an HTTP code 200 and something like {'found_photos': 2, 'activated_photos': 1}.

Photos of redacted notes should be cleaned up without considering MAX_LIFETIME_AFTER_NOTE_CLOSED_DAYS

Really? Since notes are often reactivated within the first few days after they got closed, I think the photos could still be useful. But that's up to you. However, do you really want to have this removed? I mean the feature is already there and you can just set the constant to

(Actually I had in bug in that, but it is fixed now.)

How well does the service handle if I would go about deleting this and that image manually from the filesystem (without touching the DB)?

activate.php would return HTTP 500 with {'error': 'Cannot move file'} and abort. cleanup.php would print something like PHP Warning: unlink(../photos_tmp/18.jpg): No such file or directory and go ahead. The corresponding database entry will only be deleted when it's due because of the currently implemented cleanup rules. Do you want that something different happens?

Why is the file extension saved to DB? (Because it is in the filesystem already). Hmm and actually, why is the id even saved to DB?

It's also in the name

Well, if I remove file_id and file_ext, the only columns left will be creation_time and note_id. Where is then the association between photo and note? Maybe it is possible to change everything in a way that the file name is solely extracted from the OSM note. But I think relying on information only available remotely is not so nice. Additionally, this would require more parsing and user-supplied input validation. I consider it easier and probably cleaner to do this how it currently happens. Is this a no-go for you?

For making the file name as short as possible, perhaps a-z could also be used to count-up the filename.

I have to say that I'm not excited about this. I thought about file name generation. At first I had 32 byte random values for collision free identification. But then I also had the opinion that short file names would be great. Therefore, I came up with just using the incremental primary key ID of the database entry. Changing this to also include other characters may be much more work to implement. The encoding would probably be needed to be done manually, if hex or base64 are not enough. Besides, the service works more or less based on the fact that the file has a numeric ID. I think several parts would need to change. And the current file names are not utterly long either. Is this change really important for you?

The service must be prepared to make all calls to the OSM Notes API with a logged in user, specifyable in the config. (because of GDPR / DSGVO the notes api will perhaps not be allowed to be used as anonymous)

Oh, yes. I will implement this.

What seems to be missing is that the cleanup script also starts deleting photos if the max directory size exceeds the configured limit (oldest first)

Right, as stated in the SC issue, this was deliberately omitted, because I was not sure if you really need this considering the amount of uploaded photos in the past. But since you seem to want that feature, I can implement it.

Finally, double check everything regarding not accepting parameters as-is from the user and preventing the possibility of a SQL injection?

Already did that. At least it is my job finding such vulnerabilities ;) But if I overlooked something, what certainly could happen, let me know immediately, so I can fix that.

That was much...

exploide commented 6 years ago

Regarding 6: there is a difference between closed (=resolved) notes and redacted (=hidden) notes. That the latter exist is a relatively new features and enables moderators to hide/censor spam or otherwise inappropriate content by completely hiding them from non-moderators. I am not sure about the Http code that is returned when querying a note like this by id - if 404 or something else

Oh, for some reason I missed the word 'redacted' when reading your comment. I already implemented this at https://github.com/exploide/sc-photo-service/blob/master/cleanup.php#L23 I searched for such a note on the forums and it seems to be 410 Gone.

Regarding 7, it is just important that the service will not get into some kind of undefined or invalid state but keeps working as expected l. Noting missing pics as warnings is fine

The only thing I currently can imagine is that you want to activate a note with two (new) photo urls, the first one goes fine while the second one causes the error. The state of the service will not be inconsistent, but the response won't reflect that one photo got successfully activated while the other don't. You currently only see the 500.

Regarding 8: no, just asking. Specifically I wondered about the file extension.

Ok, great. The thing with the file extension is, that I would need to do something like file globbing if I wouldn't know the extension to complete the file name. This way I have everything at hand.

Regarding 9: no, not that important. Though, the shorter the better, and, hex IS numerical, isn't it?

Yes, this "only" would require a conversion when specifying either database ID or file name. But base 10 vs base 16 doesn't make a big difference in size. After one million photos you saved two characters. I propose keeping it as is for now. If this really becomes an issue we can do a relaunch or something like that :P

exploide commented 6 years ago

The missing features are implemented now.

You can authenticate against the OSM API if you like, but this is optional. This way you can leave username and password set to NULL until it becomes necessary.

Files are rejected during upload if the size is larger then the configured limit.

cleanup.php starts deleting old active photos when the configured limit is exceeded. Currently it won't delete much, only few files more than what's necessary to adhere to the limit. Additionally, this only affects active photos. Inactive photos are purged anyway after $MAX_TMP_LIFETIME_HOURS.

Now the only open points I have:

Is it fine keeping the JSON output?

Should something change concerning point 7 (manual deletion robustness) in contrast to what I wrote in https://github.com/exploide/sc-photo-service/issues/1#issuecomment-417431387 ?

westnordost commented 6 years ago

Regarding point 7, it seems fine.

Regarding JSON input and output (the former is necessary as well, isn't it?), I would prefer it but do not insist on it.

exploide commented 6 years ago

True, it is. Ok, I would say let's keep it for now. If you come back and say this really complicates things for this and that reason, then it may change :P