Closed paboum closed 8 months ago
and the whole update is broken and the results are lost. This should either retry connections, or present to me the partial results, or use another best effort approach. Simply failing and displaying
(Error)
is not enough.
I apologize for this. In the last update, I set a timeout on connections that was too low. This has already been fixed on dev, along with better error messages, and it will retry a few times if it fails.
Another thing is that calling Civitai API 200 times just to see if anything was updated seems cumbersome.
I do agree that this is a little silly, but as far as I can tell from their API documentation, there is no way to query more than one model at a time in a granular way.
if it has been checked less than a day ago, then skip it.
I can work to implement this, yes.
just have a microservice query for top 5000 loras daily
I don't really have the resources to do this.
just checking the sha hash of a model isn't the best method to identify it.
The checksum is only used for fetching the initial information via scanning all models without model information. There is no way to determine the origin of a user-created model (which is what a user-pruned model is). As long as the pruned model maintains the same name or the metadata is renamed to match that of the pruned model, the model information should be preserved. But doing that automatically would require writing compatibility code for every extension that can prune models, and there would be no way to account for external tools.
I apologize for this. In the last update, I set a timeout on connections that was too low. This has already been fixed on dev, along with better error messages, and it will retry a few times if it fails.
This seems to have helped. Thanks!
I do agree that this is a little silly, but as far as I can tell from their API documentation, there is no way to query more than one model at a time in a granular way.
Not sure if this helps, but perhaps they would be interested in adapting their API to customers' needs? After all, they want us to use their API (it's working), but surely don't want their servers overloaded by suboptimal queries. I would try and contact them.
The checksum is only used for fetching the initial information via scanning all models without model information. There is no way to determine the origin of a user-created model (which is what a user-pruned model is). As long as the pruned model maintains the same name or the metadata is renamed to match that of the pruned model, the model information should be preserved. But doing that automatically would require writing compatibility code for every extension that can prune models, and there would be no way to account for external tools.
This seems to be a good reason to suggest a change in .safetensors file format (possibly others too) to include a manifest which would state the model author's signature, model version and/or generation date, perhaps also the other information Civitai Helper struggles to gather in other ways. Apparently there are hundreds of thousands of models incoming and the community needs some level of structure while installing, updating and using them. Civitai seems to be a good place to start such effort. Perhaps they are already working on something like this?
This seems to be a good reason to suggest a change in .safetensors file format (possibly others too) to include a manifest which would state the model author's signature, model version and/or generation date, perhaps also the other information Civitai Helper struggles to gather in other ways.
The safetensors format does allow for storing metadata in its header of an arbitrary length, but the bigger issue is convincing model authors to use it. I suppose Civitai could edit the model headers themselves, but that would also complicate identifying a model, since any changes to the file header would result in changes to the hash, which also doubles as a security feature: without the same hash, you have no way of verifying Civitai hasn't injected something dangerous into a model post-upload.
There is a usable function in webui that only hashes content after the header, but the issue with that is that there's no way to lookup a model with that hash, and that would still be changed by model pruning.
I've decided to stop pruning Loras, as only checkpoint size impacts memory significantly. Trying to restore all original Loras now.
With limited success, I was able to recollect some original models with a script based on the idea:
l=some_lora_name
curl -s https://civitai.com/api/v1/models?query="$l" > temp
cat temp | jq '.items[].modelVersions[] | { id, "name" : .files[].name, id } | join(":")' | tr -d '"' | while read s
do
id=`echo "$s" | cut -d ':' -f 1`
filename=`echo "$s" | cut -d ':' -f 2`
if [ "$l.safetensors" == "$filename" ]
then wget -qO "$filename" https://civitai.com/api/download/models/$id
fi
done
This mostly fails on the filenames including version numbers and in different formats too. I am currently experimenting with various sed commands to trim various -V10
, _v1.2
and similar suffixes.
Perhaps Civitai Helper could include similar heuristic for the loras that can't be found based on their hash.
Another approach would be to simply create an index, a dictionary from strings (filenames, or hashes, or both) into model id
numbers. This would be an append-only data structure and could even be hardcoded in the source code. The user could then choose if they wish a fast offline query or deep and up-to-date online check.
I've managed to improve the above to recollect ~50% of my Loras, that's good for now. The key part:
echo "$s" | sed -e "s:[-_ \.]*[vV]\?[0-9\.]\+[a-z]\?$::" -e "s:_: :g" -e "s:\([A-Z]\): \1:g" |
Other than this, I suggested that in Model Toolkit they preserve original file information so that Civitai Helper can access it (https://github.com/arenasys/stable-diffusion-webui-model-toolkit/issues/41) and that Civitai allows API filename search (https://github.com/orgs/civitai/discussions/183#discussioncomment-7257089).
The key part:
echo "$s" | sed -e "s:[-_ \.]*[vV]\?[0-9\.]\+[a-z]\?$::" -e "s:_: :g" -e "s:\([A-Z]\): \1:g" |
I think I understand most of this except the last "s:\([A-Z]\): \1:g"
It adds a space before each capitalized letter. E.g SlawomirMentzen
becomes Slawomir Mentzen
- couldn't find it without it with this API (as it searches for the Lora title, not the filename).
Btw, I've found Civitai's private API used by the webpage using https://meilisearch-new.civitai.com/multi-search
endpoint and Authentication Bearer token from my web browser session. It is possible to use it with https://github.com/lwthiker/curl-impersonate to obtain Id number for almost every filename, like:
curl_chrome116 ... --data-raw '{"queries":[{"q":"elevator_v0.4-locon-000007", "indexUid":"models_v2"}]}' > temp
cat temp | jq '.results[].hits[].id' | while read id
curl https://civitai.com/api/v1/models/$id > temp2
cat temp2 | jq '.modelVersions[] | { id, "name" : .files[].name, id } | join(":")' | ...
but I cannot recommend it for placing in Civitai Helper's code, for various reasons; a) The user would need to provide their authentication bearer token which may be too difficult. b) Meilisearch seems to be their bottleneck and they probably pay for it so they may be very unhappy if you use it outside their web front and not watch the ads. c) It's cumbersome for the very least, the mass user should probably wait until they expand the API as suggested in https://github.com/orgs/civitai/discussions/183#discussioncomment-7257089
Yeah, I agree that we're not going to use it in this extension. If they wanted to provide a version for us to use of that, they would have documented it
Alright, I've re-written your shell script into python, taking some liberties to make it a bit more general-use. I'll see about integrating it in the future:
""" download_model_by_name.py
Downloads a model using only the model's filename.
"""
import os
import time
import platform
import re
import sys
import requests
import urllib3
default_headers = {
"User-Agent": (
"Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148"
)
}
SERVICE = "https://civitai.com/api/"
def get_url(url, retries=0, headers={}):
urllib3.disable_warnings()
for key, val in default_headers.items():
headers[key] = val
try:
response = requests.get(
url,
stream=True,
verify=False,
headers=headers,
timeout=100
)
except TimeoutError:
print("Request timed out :(")
return None
if not response.ok:
print(f"GET Request failed with {response.status_code}")
if response.status_code == 404:
return None
if retries < 3:
print("Retrying")
return get_url(url, retries)
print("GET Request success!")
return response
def write_file(response, filename):
downloaded_size = 0
total_size = int(response.headers['Content-Length'])
start = time.time()
with open(filename, "wb") as dl_file:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
dl_file.write(chunk)
dl_file.flush()
# The rest of this is just a progress bar
downloaded_size += len(chunk)
elapsed = time.time() - start
speed = downloaded_size // elapsed if elapsed >= 1 else downloaded_size
# Mac reports filesizes in multiples of 1000
unit = 1000 if platform.system() == "Darwin" else 1024
i = 0
while speed > unit:
i = i + 1
speed = speed / unit
if i >= 3:
break
speed = round(speed, 2)
multiple = ["", "K", "M", "G"][i]
# progress
progress = int(100 * downloaded_size / total_size)
completed = "-" * min(progress // 2, 50)
remaining = " " * max(50 - (progress // 2), 0)
sys.stdout.write(f"\r[{completed}{remaining}] {progress: 3}% @ {speed}{multiple}Bps")
sys.stdout.flush()
print("\n")
"""
# Alternative file write with tqdm progressbar, requires tqdm:
def write_file(data, filename):
from tqdm import tqdm
with open(filename, "wb") as dl_file, tqdm(
total=total_size,
unit='iB',
unit_scale=True,
unit_divisor=1024
) as progress_bar:
for chunk in data.iter_content(chunk_size=1024):
if chunk:
downloaded_size = dl_file.write(chunk)
# write to disk
dl_file.flush()
progress_bar.update(downloaded_size)
"""
def model_name_to_service_name(model_name):
service_name = model_name.replace("_", " ")
service_name = re.sub(
r"[- .]*[v]?[0-9\.]+(?:[a-z0-9\-]*)?$",
"",
service_name,
re.I
)
service_name = re.sub(
r"([A-Z])",
lambda x: f" {x.group(0)}",
service_name
).strip()
service_name = re.sub(r"\s\s+", " ", service_name)
return service_name
def download_model(model_path):
# This does not check if the model_path actually exists
filename = model_path
if "/" in filename:
# remove path prefix from filename
filename = filename.split("/").pop(-1)
# remove extension
model_name, _ = os.path.splitext(filename)
api_query = f"""{SERVICE}v1/models?query={
model_name_to_service_name(model_name)
}"""
response = get_url(api_query)
if not response:
print(f"Could not get model info for {filename} :(")
return
model_info = response.json()
if len(model_info.get("items", [])) == 0:
print("No models found.")
return
for item in model_info["items"]:
for version in item["modelVersions"]:
for version_data in version["files"]:
version_filename = version_data["name"]
version_id = version_data["id"]
print(f"Found model {version_id}: {version_filename}")
if version_filename in [f"{model_name}.{x}" for x in ["safetensors", "ckpt"]]:
model_url = version_data["downloadUrl"]
version_file = get_url(model_url, headers={"Content-Disposition": None})
if version_file:
write_file(version_file, model_path)
print(f"{filename} saved!")
return
if __name__ == "__main__":
download_model(sys.argv[1])
I will test it on my side after the weekend and let you know.
Meanwhile I created this ticket: https://github.com/bmaltais/kohya_ss/issues/1601 - which I believe can help avoid any of similar issues in the future, if it becomes a standard.
The main part of this issue should be resolved in the latest version. Changes to model update code will be addressed at a later time. If you wish to track that particular feature, feel free to open a new issue as a feature request. However, I do not wish to give users browsing the issue list the impression that updating is still broken.
I have like 200 Loras installed. When I try to "Check models' new version", it is enough that one of them fails with:
and the whole update is broken and the results are lost. This should either retry connections, or present to me the partial results, or use another best effort approach. Simply failing and displaying
(Error)
is not enough.The only workaround I can think of right now is moving most loras out of my setup, run the update for the partial set, and maybe it will succeed.
Another thing is that calling Civitai API 200 times just to see if anything was updated seems cumbersome. First of all, we could assume that each lora is only updated once a day - if it has been checked less than a day ago, then skip it. This would resolve my issue btw. Then, a bulk query should be possible to API, asking for the most recent versions of each lora in the set. If their API doesn't allow that (and they refuse to improve it), then caching such results would be prudent - just have a microservice query for top 5000 loras daily, and then Civitai Helper would fetch that result instead of querying Civitai - they benefit from this too as only get their API asked 5000 times daily, instead of all users asking for all loras multiple times.
Finally, as a side remark, just checking the sha hash of a model isn't the best method to identify it. I use https://github.com/arenasys/stable-diffusion-webui-model-toolkit to prune checkpoint models from unnecessary stuff that makes them easier to squeeze in my VRAM. And now if I used Civitai Helper to find their updates, it would yield errors because the hash is now different. Either Civitai Helper should somehow know what checksum of a model was before pruning, or calculate the checksum of the part that is never pruned, or use the abovementioned microservice to translate various hashes into their normalised form.