Open josecelano opened 6 months ago
If we don't want unregistered users to see the images, should we handle it on the API side instead of the GUI side? The API will return sanitized descriptions.
If we still want to handle it on the GUI, I think we can sanitize all the descriptions in the list on page load or in the background, and display the sanitized descriptions.
Hi @ngthhu,
Each option has its pros and cons.
If we don't want unregistered users to see the images, should we handle it on the API side instead of the GUI side? The API will return sanitized descriptions.
Cons:
The endpoint contains the original URL:
{
"torrent_id": 2,
"uploader": "admin",
"info_hash": "443c7602b4fde83d1154d6d9da48808418b181b6",
"title": "Ubuntu",
"description": "Ubuntu\n\n![Ubuntu logo](https://logos-world.net/wp-content/uploads/2020/11/Ubuntu-Logo-2004-2010.png)",
"category_id": 5,
"date_uploaded": "2024-03-14 07:55:23",
"file_size": 4932407296,
"seeders": 4,
"leechers": 0,
"name": "ubuntu-23.04-desktop-amd64.iso",
"comment": "Ubuntu CD releases.ubuntu.com",
"creation_date": 1681992794,
"created_by": "mktorrent 1.1",
"encoding": null
}
The server must always sanitise the description field in any SQL query. If you forget to sanitize the field in one case, it could lead to some holes. That could also happen in the front end, but we use a Markdown
component that always sanitizes the description. It could happen if we use the description with another component.
proxy
API context would be coupled to the rest of the API. This doesn't have to be a problem. Just to consider it. Now, the frontend is coupled to the image proxy.Pros:
If we still want to handle it on the GUI, I think we can sanitize all the descriptions in the list on page load or in the background, and display the sanitized descriptions.
We are sanitizing the description in the Markdown component.
Cons:
description
, you need to remember to sanitize it even if it's not used in any component (it could in the future).Pros:
I apologize for the delayed response.
the users can just copy/paste the original URL and load it if they want.
So we allow unregistered users to access the images original URLs. I thought they should not access those URLs at all.
We have to sanitize even if the data we fetch could not be rendered at all.
I agree with this. However, as a user, I will often click on a list item to expand and collapse it multiple times. It would be more efficient to sanitize the descriptions beforehand.
I think that could be harder to maintain because the place where you need sanitization and the place where you do it are not the same place and are not done at the same time. I mean, if you add a new endpoint with the description, you need to remember to sanitize it even if it's not used in any component (it could in the future).
We can store the sanitized descriptions in the torrent object every time we fetch any data containing the descriptions from the API. I apologize if I have misunderstood your idea.
I apologize for the delayed response.
No worries.
the users can just copy/paste the original URL and load it if they want.
So we allow unregistered users to access the images original URLs. I thought they should not access those URLs at all.
Not directly. I meant you can always see what comes from the API. But maybe that could be a good idea. Maybe we could add an option in the future to show the description as plain text like when you are editing. But for now It does not make sense.
We have to sanitize even if the data we fetch could not be rendered at all.
I agree with this. However, as a user, I will often click on a list item to expand and collapse it multiple times. It would be more efficient to sanitize the descriptions beforehand.
OK, I have not thought about that. That's a good reason to do it when we fetch. However, if the page size is big, the user could only open some rows.
I think that could be harder to maintain because the place where you need sanitization and the place where you do it are not the same place and are not done at the same time. I mean, if you add a new endpoint with the description, you need to remember to sanitize it even if it's not used in any component (it could in the future).
We can store the sanitized descriptions in the torrent object every time we fetch any data containing the descriptions from the API. I apologize if I have misunderstood your idea.
I understood your idea, but I did not like changing the torrent object just because, in one view, you need to sanitize it. But now I understand your reason. But I think we need a new object (SanitizedTorrent). I think we should not change the Torrent object because, in future features, we could need the original description
content. And that's an object wrapping the API response. Maybe it's also a good idea not to depend directly on the API response.
Conclusion: I agree, we can sanitize in the object we pass around with the torrent indo. However I'm not sure yet if we should overwrite the value for description. I have the feeling that we have a different object with a different responsibility.
SanitizedTorrent
or we can call it Torrent anyway but in a different namespace. We can have a module which is wrapper for the API.I hope that does not sound too complicated :-)
We can store the sanitized descriptions in the torrent object every time we fetch any data containing the descriptions from the API. I apologize if I have misunderstood your idea.
My idea was to process and add a field called
sanitizedDescription
to the object on the UI side after receiving the object from the API, while retaining the originaldescription
field.
I don't know if it's redundant when that process is handled by the API side.
We can store the sanitized descriptions in the torrent object every time we fetch any data containing the descriptions from the API. I apologize if I have misunderstood your idea.
My idea was to process and add a field called
sanitizedDescription
to the object on the UI side after receiving the object from the API, while retaining the originaldescription
field.I don't know if it's redundant when that process is handled by the API side.
Okay, I'm starting to think we should do it in the backend and frontend, even if it's redundant. The client might have JS disabled, and it could load the original image. In that case, I would only use the description
field. We could assume that the field is always sanitized.
If we sanitize on the backend we could do it:
We must replace image URLs with the proxied ones when we fetch the data because the proxy URL could change.
I guess I've changed my mind and now I totally agree with what you commented here :-)
The user would not know the original URL. We want to preserve privacy, but with the current solution, the users can just copy/paste the original URL and load it if they want.
What I said is not true. The proxied image URL contains the original image URL, so the user can see the image anyway if we sanitize it in the backend.
I like this option because we minimize variables containing wrong values and it's simpler. My problem was I had in mind the idea that sanitization should not be done in the backend. But I was probably confused with this other concept:
https://benhoyt.com/writings/dont-sanitize-do-escape/
What you should not do is validate the input. You should store the original input and sanitize depending on the output context.
NOTES:
Thank you for your response @josecelano. I'm a newbie to this proxy cache thing and sanitizing. My suggestion is mostly from the user's point of view.
Thank you for your response @josecelano. I'm a newbie to this proxy cache thing and sanitizing. My suggestion is mostly from the user's point of view.
That was a good suggestion. UX is very important, and waiting for the sanitiser every time you expand an item it's a bad experience.
Relates to: https://github.com/torrust/torrust-index-gui/discussions/519
Images in the torrent description only show if the user is logged in. If the user is not logged in, you see something like:
Images are served using a proxy cache to preserve the user's privacy. If we served images directly from the original URL, a malicious user could track other users' activity by tracking served images.
Since the proxy takes a lot of resources from the server, we decided to assign a quota per user that you can config in the config file:
We also need to identify the user to update the quota.
Docs for image configuration: https://docs.rs/torrust-index/3.0.0-alpha.2/torrust_index/config/struct.ImageCache.html
We should add this explanation to the User Guide and also to the crates docs.
Maybe we should also add a link to the User Guide in the application.