szcf-weiya / ESL-CN

The Elements of Statistical Learning (ESL)的中文翻译、代码实现及其习题解答。
https://esl.hohoweiya.xyz
GNU General Public License v3.0
2.39k stars 588 forks source link

url or identifier for disqus thread #219

Closed szcf-weiya closed 4 years ago

szcf-weiya commented 4 years ago

I just realized that I forgot to set the this.page.url and this.page.identifier after using the Disqus for mainland China, and then it will create a new thread for Disqus for every new page, such as treating

https://esl.hohoweiya.xyz/05-Basis-Expansions-and-Regularization/5.2-Piecewise-Polynomials-and-Splines/index.html

and

https://esl.hohoweiya.xyz/05-Basis-Expansions-and-Regularization/5.2-Piecewise-Polynomials-and-Splines/index.html?from=singlemessage&isappinstalled=0

as different pages, but they are the same.

Without manually specifying the url or identifier, then Disqus will take window.location.href as the url (https://help.disqus.com/en/articles/1717084-javascript-configuration-variables). Then in that case, if there are comments under the first page, it will not appear on the second page and each page will keep its own comments. That is too bad!!

szcf-weiya commented 4 years ago

In Disqus for mainland China, if the identifier is empty, then it set as the url, https://github.com/fooleap/disqus-php-api/blob/88dd45a767b9205f9a599ee8de9148aba2af8944/src/iDisqus.js#L280 and then if no found existed thread, it will require to create a thread, such as this one I run on localhost, image so the key point is that assigning a unique url for each page, and also not violate the current setting, i.e., do not to use different idenfitier.

szcf-weiya commented 4 years ago

The moderate panel logs also support the above guess, some new threads are created even though there are some existed threads. image

szcf-weiya commented 4 years ago

another related problem is that the sidebar latest comments have not been updated image and I found that some links are outdated, i.e., point to different urls, and may belong to different threads, then I follow https://mycyberuniverse.com/how-delete-discussion-threads-incorrect-url-disqus.html, and try to correct several outdated links by url mapping (attached is the maps), and to see if it can update more frequently.

urlmaps-2020-02-27.txt

It works immediately!!

szcf-weiya commented 4 years ago

replace // to /

there are some historical links contain a double slash, firstly get such records,

grep -E "z//" "esl-hohoweiya-xyz-2020-02-27T05:05:22.623114-links.csv" | sed "s/\r//"  > raw_double_slash.txt 

where sed is to remove ^M at the end of the line, refer to Text file with ^M on each line and How to remove CTRL-M (^M) characters from a file in Linux then convert them, and write as the format old_url, new_url required by Disqus.

grep -E "z//" "esl-hohoweiya-xyz-2020-02-27T05:05:22.623114-links.csv" | sed "s/z\/\//z\//" > fixed_double_slash.txt
paste -d', ' raw_double_slash.txt fixed_double_slash.txt > double_slash_maps.txt

double_slash_maps.txt

szcf-weiya commented 4 years ago

end with /index.html not /

extract the records

sed -n "s/\/\r/\//p" "esl-hohoweiya-xyz-2020-02-27T05:05:22.623114-links.csv" > no_index.txt

then modify them

sed -n "s/\/\r/\/index.html/p" "esl-hohoweiya-xyz-2020-02-27T05:05:22.623114-links.csv" > fix_no_index_maps.txt

and then write as maps,

paste -d ',' no_index.txt fix_no_index_maps.txt > no_index_maps.txt

one more step is that to replace the %20 with -,

sed -i "s/\%20/-/g" no_index_maps.txt

no_index_maps.txt

szcf-weiya commented 4 years ago

replace empty space with -

extract records,

sed -n "/\%20/p" "esl-hohoweiya-xyz-2020-02-27T07:42:29.756937-links.csv" | sed "s/\r//" > space_delim.txt

then

sed 's/\%20/-/g' space_delim.txt > fix_space_delim.txt

and some particular fixings,

sed -i 's/,//g' fix_space_delim.txt
sed -i 's/\%2C//g' fix_space_delim.txt
sed 's/\/$/index.html/' fix_space_delim.txt

it is necessary to note that the last one only works when the end line is not \r, i.e., sed "s/\r//" is necessary.

space_delim_maps.txt

szcf-weiya commented 4 years ago

some special fixes

sed -n '/?from/p' "esl-hohoweiya-xyz-2020-02-27T09:25:59.917885-links.csv" | sed 's/\r//' > specials.txt 
sed -n 's/?from.*\r//p' "esl-hohoweiya-xyz-2020-02-27T09:25:59.917885-links.csv" | sed 's/\r//'> fix_specials.txt
paste -d',' specials.txt fix_specials.txt > special_maps.txt

and

sed -n '/\%2520/p' "esl-hohoweiya-xyz-2020-02-27T09:25:59.917885-links.csv" | sed 's/\r//' > specials1.txt
sed -n 's/\%2520/-/gp' "esl-hohoweiya-xyz-2020-02-27T09:25:59.917885-links.csv" | sed 's/\r//' > fix_specials1.txt
sed -i 's/,//g' fix_specials1.txt
paste -d',' specials1.txt fix_specials1.txt > special_maps1.txt

then combine by cat sepcial_maps_all.txt

szcf-weiya commented 4 years ago

Now the URLs indeed are updated by checking the url list, but the exported comments seem no change, and the Edit Discussions panel is still not changed. Maybe need 24 hours as it said.

szcf-weiya commented 4 years ago

I guess the migration will take effect only if the pages are visited since I export the comments every day and found that there are indeed some fixes, but still some remains.