mikwielgus / forum-dl

Scrape posts, threads from forums, news aggregators, mail archives, export to JSONL, mailbox, WARC
MIT License
74 stars 1 forks source link

Feature Request: Append @domain to username #6

Closed peterjschroeder closed 1 year ago

peterjschroeder commented 1 year ago

Would you consider adding an option to append the username+domain to the From header?

Example:

https://www.dragonsfoot.org/forums/

From: LittleBigDwarf \LittleBigDwarf@dragonsfoot.org\

mikwielgus commented 1 year ago

I can add such option, but I don't think I can strip the subdomain, as I don't think this can be done in a consistent manner. So it will be <LittleBigDwarf@www.dragonsfoot.org> instead of <LittleBigDwarf@dragonsfoot.org>.

peterjschroeder commented 1 year ago

Couldn't this be done with?:

"http://www.dragonsfoot.org" .split('.', 1)[-1]

mikwielgus commented 1 year ago

The issue is that on many forum hosting sites each subdomain corresponds to a separate forum with separate users. An addr-spec in the From header is presumed to be a unique identifier, so I think it would be too confusing to have both John@a.example.com and John@b.example.com both under John@example.com.

If removing the subdomain is crucial for you, then I'd rather look into adding a capability to add a post-processing hook to do that.

peterjschroeder commented 1 year ago

Understood, no problem. I tend to visit phpbb forums so I didn't consider that.

For redp https://github.com/peterjschroeder/redp I set the Message-ID header to handle unique identifiers.

I plan on writing a replying script to the forum for mutt, but I can work with what you have. Usually a mbox or mail file has an email in <>, which would be the fake email I was suggesting.

mikwielgus commented 1 year ago

I did as I described.

That is to say, for a replying script, perhaps what you're looking for to read from the email headers is actually the base URL of the forum (https://www.dragonsfoot.org/forums/). This is not written in the output yet, but will eventually be.

peterjschroeder commented 1 year ago

Thank you.