v2fly / domain-list-community

Community managed domain list. Generate geosite.dat for V2Ray.
https://www.v2fly.org
MIT License
4.62k stars 837 forks source link

Enhancement: add @cn attribute to `category-scholar-!cn` according to their accessibility #1943

Open Paulgudring opened 9 months ago

Paulgudring commented 9 months ago

Thx you all for great effort and contribution to the community

Now, the lists of category-scholar and category-scholar-!cn is separated based on their geo location, which is consistent with the repo's principle. However, there are websites that can be accessed in china, no matter they have isp server in china or their overseas services are not blocked, some of which authorize content accessibility by users' ip (like clarivate).

There have been issues discussing it, like #674. I agree with the maintainer's idea and I'm thoroughly aware of the difficulty. The solution there is to fall back to geoip resolve. The process is as below:

  1. geosite category-scholar-cn(included in cn) --> direct
  2. geosite category-scholar-!cn(not included in geolocation-!cn) --> pass
  3. geoip cn --> direct

To avoid dns leak, I promote solution like category-scholar-cn - category-scholar-!cn@cn - category-scholar-!cn. In that way, the process can be as below:

  1. geosite category-scholar-cn(included in cn) --> direct
  2. geosite category-scholar-!cn@cn --> direct
  3. geosite category-scholar-!cn --> proxy

For now, I use domain specific rules in my config, but it lower the performance, and readability of the config.

I'm not sure about the principle of @cn attribute. Is the criteria strictly restricted to whether there is isp server in china?

I understand the problem lies in the high specificity caused by variences between different network environment. I wrote a simple code snippet to test their accessibility (just pinged them and many of the results were wrong). But I havent figure out a proper way to test in batch connectivity across different networks so I didnt put forward a pr.

But if connectivity is the only consideration, the proposal seems to be to just assign @cn to domains that don't appear on the gfwlist, i.e., blacklisting exemptions for academic sites within the white-listing configuration as a whole, which deviates from the repository's principles of set and my intent, so I believe the strictest standard for this attribute should be having isp server in china (like clarivate - webofscience.com).

aacrjournals.org is not accessible
academic.eb.com is not accessible
acaric.co.jp is not accessible
aclweb.org is not accessible
acm.org is not accessible
acs.org is not accessible
agu.org is not accessible
aiaa.org is not accessible
aimsciences.org is not accessible
airiti.com @cn
airitilibrary.com @cn
altmetric.com is not accessible
alexanderstreet.com is not accessible
amdigital.co.uk is not accessible
academic.eb.com @cn
acm.org @cn
aclweb.org @cn
acaric.co.jp @cn
aacrjournals.org @cn
airiti.com is not accessible
agu.org @cn
acs.org @cn
aimsciences.org @cn
aiaa.org @cn
altmetric.com @cn
amdigital.co.uk @cn
airitilibrary.com is not accessible
annualreviews.org @cn
aps.org @cn
ams.org is not accessible
alexanderstreet.com is not accessible
anatomy.tv is not accessible
artstor.org @cn
analytictech.com is not accessible
ascelibrary.org @cn
asha.org @cn
asme.org @cn
arabidopsis.org is not accessible
aspbjournals.org @cn
aspenpublishing.com @cn
astm.org @cn
berkeley.edu @cn
biologists.com @cn
biomedcentral.com @cn
bioone.org @cn
arxiv.org is not accessible
asm.org is not accessible
asminternational.org is not accessible
asn-online.org is not accessible
bmj.com @cn
biorxiv.org is not accessible
bloomsburycollections.com is not accessible
bvdinfo.com @cn
bloomsburydesignlibrary.com is not accessible
cairn.info @cn
capitaliq.com @cn
ceicdata.com is not accessible
booksinprint.com is not accessible
brepolis.net is not accessible
chemnetbase.com @cn
choicereviews.org @cn
cios.org @cn
cmu.edu @cn
cochranelibrary.com @cn
degruyter.com @cn
dentalhypotheses.com @cn
ebsco.com @cn
ebscohost.com @cn
elgaronline.com @cn
elifesciences.org @cn
brill.com is not accessible
emerald.com @cn
ems-ph.org @cn
cas.org is not accessible
europepmc.org @cn
facultyopinions.com @cn
computingreviews.com is not accessible
frontiersin.org @cn
gale.com @cn
electrochem.org is not accessible
geolytics.com @cn
ggsrv.com is not accessible
global-sci.org @cn
highwirepress.com @cn
embase.com is not accessible
hindawi.com @cn
icevirtuallibrary.com @cn
igi-global.com @cn
igpublish.com @cn
eurekaselect.com is not accessible
galegroup.com is not accessible
heinonline.org is not accessible
ioinformatics.org @cn
iop.org @cn
icsd.fiz-karlsruhe.de is not accessible
isca-speech.org @cn
iwaponline.com @cn
jamanetwork.com @cn
japanknowledge.com @cn
jbe-platform.com @cn
hanzhen.xmulib.org is not accessible
jmlr.org @cn
jstage.jst.go.jp is not accessible
jneurosci.org @cn
infolinker.com.tw is not accessible
jstor.org @cn
karger.com @cn
kuke.com @cn
lexisnexis.com @cn
literatumonline.com @cn
mdpi.com @cn
medrxiv.org @cn
nature.com @cn
morganclaypool.com @cn
informs.org is not accessible
naturemag.org @cn
ncl.edu.tw is not accessible
nii.ac.jp is not accessible
neurology.org @cn
oecd-ilibrary.org @cn
osapublishing.org @cn
ovid.com @cn
jove.com is not accessible
lawdata.com.tw is not accessible
plos.org @cn
pnas.org @cn
nejm.org is not accessible
projecteuclid.org @cn
optica.org is not accessible
researchgate.net @cn
routledgehandbooks.com @cn
royalsocietypublishing.org @cn
rsc.org @cn
peerj.com is not accessible
rupress.org @cn
scholarpedia.org @cn
physiology.org is not accessible
princeton.edu is not accessible
sagepub.com is not accessible
sae.org is not accessible
scientificamerican.com @cn
science.com is not accessible
scitation.org @cn
semanticscholar.org @cn
silverchair-cdn.com is not accessible
siam.org @cn
science.org is not accessible
spiedigitallibrary.org @cn
statsmakemecry.com @cn
thelancet.com @cn
tickdata.com @cn
thieme.de @cn
thieme-connect.com @cn
thieme-connect.de @cn
medone-education.thieme.com @cn
sciencemag.org is not accessible
ucla.edu @cn
umass.edu @cn
usaco.org @cn
westlaw.com @cn
scienceonline.org is not accessible
worldscientific.com @cn
yale.edu @cn
zenodo.org @cn
totalmateria.com is not accessible
uchicago.edu is not accessible
dl.begellhouse.com @cn
databank.worldbank.org @cn
database.asahi.com is not accessible
wiley.com is not accessible
wolterskluwer.com is not accessible
muse.jhu.edu @cn
angle.com.tw is not accessible
beck-online.beck.de is not accessible
elib.maruzen.co.jp is not accessible
firstsearch.oclc.org is not accessible
t21.nikkei.co.jp is not accessible
t21ipau.nikkei.co.jp is not accessible
ulrichsweb.serialssolutions.com is not accessible
wrds-www.wharton.upenn.edu is not accessible
Paulgudring commented 9 months ago

In short,

  1. for users in need, based on discussion results in #256, try geolocation-!cn & !gfw . It changes the least and shall conform with the repo's principle. somewhat "local blacklist in global whitelist"
  2. add @cn to all accessible website in category-scholar-!cn (not recommended, it breaks up the set)
  3. add @cn to website with isp server in china (my suggestion, pull request after finding a proven way).