qbittorrent / qBittorrent

qBittorrent BitTorrent client
https://www.qbittorrent.org
Other
28.56k stars 4k forks source link

Ability to archive RSS json as HTML files (in order to avoid re-downloading older entries + losing older entries because of a daily/weekly threshold) #19248

Open GoogleBeEvil opened 1 year ago

GoogleBeEvil commented 1 year ago

Ability to have RSS JSON be archived as HTML. Add a function to export the JSON files to HTML or some format which can be imported by another RSSreader

Elaboration:

Suppose there are 2 feeds on the page of RSS tag,which are named A & B.

A accumulates 10000 articles per years (each article contains poster images, brief introductions, magnets for films etc...). If I wanna change to another RSS reader software, I certainly can add the same feeds easily to it, but 9990 articles will be lost. This is because the feeds only keep a daily (latest) 10 articles, the software can never grab the 9990 articles which are out of date.

If qBit can generated a local web HTML file (A.html or A.mht or some other fomats), I will be able store the 10000 articles in that web archive file, thus I can simply drop the *.html files into the browser and view the contents that don't show up in the daily feed.

AND if B feed link is broken or stopped, I can also archive it (and browse it later).

The feature is just like https://web.archive.org at some level, but stores to local offline for RSS).

At first , I want to request for export json to the format which other RSS reader can import, but different software use different format , so HTML is better.

luzpaz commented 1 year ago

Propose a rename for ticket: Ability for RSS feeds to be archived (for 3rd party means)

glassez commented 1 year ago

What should be archived? You could just use your RSS feed URLs to add them in another reader...

GoogleBeEvil commented 1 year ago

What should be archived? You could just use your RSS feed URLs to add them in another reader...

but how can I get the expired articles which were never exist on original feed url

luzpaz commented 1 year ago

@GoogleBeEvil please take a moment to really elaborate what you are trying to achieve. When we read this ticket, it's not very clear. That's because there are not enough details to understand your motivation.

luzpaz commented 1 year ago

but how can I get the expired articles which were never exist on original feed url

@GoogleBeEvil please elaborate

luzpaz commented 1 year ago

bump

luzpaz commented 1 year ago

@GoogleBeEvil please help us understand what you are trying to do. Explain like we're 5 y/o

GoogleBeEvil commented 1 year ago

@GoogleBeEvil please help us understand what you are trying to do. Explain like we're 5 y/o

Sorry for late reply ,let me explain what I mean and why I need this feature:

Suppose there are 2 feeds on the page of RSS tag,which are named A ,B . A accumulate 10000 articles by years(each article contains poster images ,brief introductions,magnets for films) , if I wanna change to another RSS reader softs ,I certainly can add the same feeds easily to the softs,but 9990 articles will be lost,because the feeds only keep the daily latest 10 articules ,the softs can never grab the 9990 articles which are out of date.If qbit can generated a local web html file(A.html or A.mht or some other fomats), I will be able store the 10000 articles in that web achieve files ,thus I can simply drop the *.html files to browser and view the contents never exist in the feeds.

AND if B feed link is broken or stop service,I can also make it achived and browser it

The feature is just like https://web.archive.org at some level,but store to local offline for RSS ).

At first ,I want to request for export json to the format which other RSS reader can import,but diffrent softs use diffrent format ,so html is better

luzpaz commented 1 year ago

Thanks, I cleaned it up a bit and added it to the OP. Hope that is OK with you ?

luzpaz commented 1 year ago

@GoogleBeEvil Ok, can you propose how a UX workflow and how the UI would look like for this feature to be implemented ?

luzpaz commented 1 year ago

CC @Omar-Abdul-Azeez
Hi, since you are tackling an RSS related issue, can you weigh-in on this one ?

Omar-Abdul-Azeez commented 1 year ago

I'm not sure if I'll be able to help with the implementation, but I'll try nonetheless.

For a start, the UI.

  1. 1st approach:
    • For feeds, right click on feed→Archive\Export→save file dialog (choose name and save path).
    • For folders, same procedure but open a tree structured dialog same as where the feeds are shown in the RSS tab with the ability to rename feeds and subfolders (with double click or button) and choose save path and ideally a checkbox next to each feed and folder, which exports a folder hierarchy similar to the tree structure of the RSS feeds.
    • For a full export, since the root folder isn't right clickable, or even shown, a button next to the Update all button which opens the exact same window as any other folder but aimed at the root folder instead.
  2. 2nd approach:
    • A button next to the Update all button which does the same exact thing as the 1st approach, but now with a must-be-present checkbox next to each feed and folder to choose what to export.

Pros & Cons: 1st approach is more complex but very user-friendly. 2nd approach is simpler but always exports a hierarchy even when exporting a few feeds.

As for the serializer code, I've taken a look and here are a few things to note:

  1. The channel's info, as far as I can tell, isn't saved anywhere. only title and lastBuildDate are loaded into the feed object and even those don't seem to be saved, which means a working and parsable RSS feed is needed at the time of exporting the history, along with parsing both link and description to populate the required channel elements with correct information.
  2. items' enclosure element is parsed, loaded into either altTorrentUrl or torrentUrl depending on it's mime type. And the link element is parsed, if it's a magnet link it's saved into TorrentUrl if not, into link. In the end if TorrentUrl is empty it uses altTorrentUrl then link as fallbacks. Which means depending on which one gets parsed last, the other one will be overwritten. and only TorrentUrl and link are saved. Another loss of data.
  3. I've encountered RSS feeds with <site name:element> as elements rather than just <element>. I'm not sure what this is or whether it even matters but that site name is also not parsed so that's something.
  4. I don't know of an Atom feed to test on, but after parsing and saving the articles, RSS and Atom don't seem to be distinguishable to me other than checking the parsed elements and guessing which format it originated from, which doesn't seem feasible. If so, which format to export needs to be decided. Having both would be nice but is the Atom format really needed as an export option???

For the first point, I suggest saving the channel's info since depending on a feed to be online to be able to save it's already-downloaded article history is just stupid. For the second, saving enclosure into enclosure (in the file) and link always into link. And leaving the previous logic for torrentUrl as is. This way qbit can use torrentUrl as it has been while also having enclosure and link at hand for exporting. for the 3rd ignore the extra info. and 4th only export RSS.

glassez commented 1 year ago

@Omar-Abdul-Azeez The above looks too complicated, IMO. If I ever dealt with this, I would just try to change RSS feeds to be stored in the original format. Then export will effectively equal to copy.

GoogleBeEvil commented 1 year ago

@GoogleBeEvil Ok, can you propose how a UX workflow and how the UI would look like for this feature to be implemented ?

UI parts :Either of @Omar-Abdul-Azeez 1st&2nd approach is acceptable

and something more should be considered:

  1. Keeping page view of *.html files the original look of qbit RSS page(3 columns,left feed name, middle title, right details) is welcomed.
  2. Make a choice exporting with checked formats media(posters,preview samples videos,etc...) whose links are in json. I noticed images will be cached by qBit after I click the middle column,once cached ,RSS page can show the images even there is no network. If exporting with images , creat folders that have the same name with *.html file,just like the way of telgram achieved the chanel. Exporting with media will creat a huge size file and need more time,but can save the media locally before the links are totally died
  3. If possiable,sort the middle column (usually it contains titles) by datein json {"date": "02 Sep 2022 16:00:00 +0000"} element or title name.
GoogleBeEvil commented 1 year ago

well ,store rss as sqlite.db is another acceptable choice(bitcomet does this),thus, I can edit the db files and export to other bt clients