Open wzpan opened 11 years ago
Would you propose to just do a Digest::MD5
hash of the file contents (like assets) and collect the url -> fingerprint key pair somewhere? I don't personally understand the use but GUID would be possible in the way I've described. Do you know if Wordpress has a way to manage new vs updated posts in RSS so that I may better understand?
Hi @plusjade ,
Is that clear? ;-)
hmm... according to the related reading they actually mean permalink, not guid or uuid. It's poor nomenclature.
In RSS feeds each <item>
may have a <guid>
which may either contains the canonical permalink
or an arbitrary string (such as a true uuid).
The purpose is so that if the name of the article were to change, the aggregator would count it as the same article, not a new one.
Judging by the vagueness of the specification and the example given as a URL, I'd say it's a safer bet to use the permalink than a true uuid.
Hi @coolaj86 ,
hmm... according to the related reading they actually mean permalink, not guid or uuid. It's poor nomenclature.
Yes! GUID can be a permalink! Here is a more clear specification on GUID: http://www.ietf.org/rfc/rfc4151.txt
When generating the rss.xml, also add the guid tag can be useful.
@plusjade ,
Do you know if Wordpress has a way to manage new vs updated posts in RSS so that I may better understand?
Yes. Maybe ruhoh need to know whether a post is an new one or just an update. Otherwise, if we modify the title, even thought it is just an update, RSS reader will regard it as a new post because the guid has changed(to the new permalink).
As I know(not 100% sure), WordPress firstly generates a postid
when the user create a draft. The postid can be a short id, which is only used to identify articles in the same site. So it doesn't need to be globally unique. For ruhoh, we can generate one and attach it to the YAML post metadata. An example:
---
date: '2013-4-2'
title: the answer
description: don't panic
tags: [explore]
categories: fiction
postid: '42'
---
When generating rss, also generate a guid from postid. I think a better guid should be production_url/?p=postid
, such as http://hahack.com/?p=42
. Then we add the guid tag to the rss file.
The more I thought about it, the more I feel the importance of postid
- it is an ideal accordance for ordering posts!
I don't quite in favor of the way ruhoh sorts my posts - they are not sorted alphabetically nor exactly by time. For example, take a look at these two posts from my homepage.
What's worse, all the articles from my wiki are sorted even more "randomly", because I didn't add the "date" metadata so ruhoh seems to sort them in a strange way.
Therefore, I think a better choice to sort the posts is by postid
- just like WordPress do. When create one draft, also generate one number. It can be a integer, and works like a post counter - make the id grows incrementally.
Probably with a step works better. For example, when I create draft A, ruhoh will distribute a post id 10
to it. Then I create a new draft C, ruhoh will distribute a post id 15
instead of 11
. Now I need to insert a draft B between A and C, I can easily modify the post id 20
from the metadata to a number between [11,14] without having to edit the post id of draft C(and draft D, E, F, ...)!
A log is needed to keep tracking the post id counter - each time ruhoh try to create a draft, it get the last post id by reading the log file, and calculate the new post id(=id_old + step) and then write it into the draft. After that, write the new post id back into the log file.
Also make it optional. Without the postid, ruhoh sort the articles by time, and attach the permalink as the guid tag to the rss.xml. For that will guarantee the downward compatibility.
I hope you carefully consider my advice. It is the key to save my wiki! ;-)
I think that extending the DATE to include the TIME would be a better approach than a number that increments by 5.
Time is much more granular.
Although I wouldn't want it to necessarily display the second of the time I created the post, It should be there for history's sake.
I'd prefer the date be created_at
and modified_at
so that once there's an online editor we can see both the original date when it was updated.
Time is more complex and longer than a post id, so it will brings more storage and transfering cost.
Although I wouldn't want it to necessarily display the second of the time I created the post, It should be there for history's sake.
In fact I seldom use draft
command to creat drafts, but directly do that via my Emacs editor and generate the post meta with the help of yasnippet. It so I will not able to write the implicit creating time info to ruhoh history. Also I will be exhausted trying to append the missing creating time to all the posts from my wiki because neither me nor Linux file system can remember the time!
Compare to time, postid should be more transparent and controllable. The step can be changed too, I suggest to make it a variable value at config.yml
. 5 can be a default value. For some sites that demands more space to insert articles, a larger number is needed.
The process id in Linux is incremental with a step, too.
I like both ideas, but I don't think they should get mixed up:
maybe the postid
should rather be called permanentid
and should never be changed (after first release).
This would be great to use for the guid-thing in RSS.
But I think it shouldn't matter what's in there: let it be some speaking text, some uuid or some integer, but the compiler should check that it is unique ;)
But why should this affect the ordering of the articles?
use the created(_at)
field if available
or the date
-field if available
or the file created date/time
for sorting.
And if those fields support adding a time when required it would help sorting articles from the same day.
This should be pretty easy to understand for everybody. (Did I forget any aspect of the discussion?)
just my 2c
Hi @karfau ,
Thanks for your comment.
But why should this affect the ordering of the articles?
OK, I will try to (informally) explain more on why I dislike using time for sorting.
In most situation, like blogging, yes - sorting by time is enough.
However, if I decide to write a book via ruhoh(why not? :smiley_cat: ) on Python. Now I've write the first chapter titled Python: Basic and the second chapter titled Python: Data Struct, suddenly I think "Oh rats! I forgot to write something about the data type before I introduce data struct!" Then I write a new post titled Python: Data Type . But sadly, since it is created later, it is sorted as the third chapter instead of the second one!
Now how to fix it? Well, you may think that by modifying the created time of the second or the third post can change the order. But isn't it dirty?
Now the problem get even more troublesome: I've finished all the chapters. "That's so nice. I'm really great!", suddenly I come up with an idea: "Oh shit! I forgot to insert an exercise chapter after each chapter!!!" Now I need to do so many evil modifying on created time. Finally I got mad and suicided.
If the posts are sorted by post id however, I can easily modify the post id of each post to put it ahead. It saved my life.
or the file created date/time
But *nix file system are NOT able to record the file created time!
Ok, i totally get your point about the ordering.
But re-reading the whole discussion from the beginning, it starts with talking about how rss-readers should know if something is still the same articel or just an update.
Solving this issue with a postid
is one valid possible solution(, if u don't change those id
s after the first release).
In my view then there is another issue mixed in here: which is the one about ordering (which should be discussed in another ticket maybe?): For posts the most natural way to sort things would be to use the date/time persisted in some way. I see that in the specific use case you describe this makes no sense.
I had this problem multiple times, specially for sites, which I needed a(t least one) custom, changable order for. I solved it by using a list of the sites somewhere in a config or even in a special page. All those times I thought about, how nice it would be, having the possibility to tell ruhoh to sort a collection of items after a specific attribute. This would be the most flexible solution, and could fix a lot of issues with ordering.
@karfau
I had this problem multiple times, specially for sites, which I needed a(t least one) custom, changable order for (...) All those times I thought about, how nice it would be, having the possibility to tell ruhoh to sort a collection of items after a specific attribute.
Custom sort order is supported in v2+ via the base model_view
You can specify the attribute and sort direction on a per collection basis by updating config.yml:
#config.yml
essays :
sort : ['guid', 'asc'] # Array is required
This will sort essays
by the guid attribute in ascending order. Ascending/descending is handled by ruby's native comparator operator so it will handle dates, numbers, and strings (alpha).
Sorry this is not documented =/. Regarding the primary thread topic, I've been working with @coolaj86 on this issue and I'll reply here after I get up to speed on all feedback in this thread.
Hi,
I wonder if we could generate a GUID(Globally Unique Identifier) when we creating a new draft, or at least attach such information via the rss generator.
Without the GUID info, it will be difficult for some application e.g. RSS readers to determine whether an article is a new post or just an update.
And it will be tedious when we try to migrate the articles from one site to another(For instance, we may need to do a lot of works to redirect the comments from the old site).