lilacai / lilac

Curate better data for LLMs
http://lilacml.com
Apache License 2.0
930 stars 86 forks source link

screenshots of issues on HF demo #601

Closed sammcgrail closed 9 months ago

sammcgrail commented 1 year ago

// 1

PNG image

hitting trash on legal-termination

`hitting trash on legal-termination ApiError message Internal Server Error url POST /api/v1/concepts/{namespace}/{concept_name} status 500 path { "namespace": "lilac", "concept_name": "legal-termination" } body { "remove": [ "811d0dcc92e14c5c881e903c7d4ff7b6" ] } details Traceback (most recent call last): File "/home/user/app/lilac/router_utils.py", line 24, in custom_route_handler return await original_route_handler(request) File "/home/user/.local/lib/python3.9/site-packages/fastapi/routing.py", line 241, in app raw_response = await run_endpoint_function( File "/home/user/.local/lib/python3.9/site-packages/fastapi/routing.py", line 169, in run_endpoint_function return await run_in_threadpool(dependant.call, *values) File "/home/user/.local/lib/python3.9/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool return await anyio.to_thread.run_sync(func, args) File "/home/user/.local/lib/python3.9/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/user/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/home/user/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, *args) File "/home/user/app/lilac/router_concept.py", line 77, in edit_concept return DISK_CONCEPT_DB.edit(namespace, concept_name, change, user) File "/home/user/app/lilac/concepts/db_concept.py", line 451, in edit raise ConceptAuthorizationException( lilac.auth.ConceptAuthorizationException: Concept "lilac/legal-termination" does not exist or user does not have access.

seems like same ish error slightly diff trace`

// 2

PNG image

// 3

Screenshot 2023-08-25 at 4 43 43 PM

minor improvement not a bug but the hover could be the whole card, cause it feels weird UX to have it just be the text

// 4

PNG image PNG image

Traceback (most recent call last): File "/home/user/app/lilac/router_utils.py", line 24, in custom_route_handler return await original_route_handler(request) File "/home/user/.local/lib/python3.9/site-packages/fastapi/routing.py", line 241, in app raw_response = await run_endpoint_function( File "/home/user/.local/lib/python3.9/site-packages/fastapi/routing.py", line 169, in run_endpoint_function return await run_in_threadpool(dependant.call, **values) File "/home/user/.local/lib/python3.9/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool return await anyio.to_thread.run_sync(func, *args) File "/home/user/.local/lib/python3.9/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/user/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/home/user/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, *args) File "/home/user/app/lilac/router_concept.py", line 77, in edit_concept return DISK_CONCEPT_DB.edit(namespace, concept_name, change, user) File "/home/user/app/lilac/concepts/db_concept.py", line 451, in edit raise ConceptAuthorizationException( lilac.auth.ConceptAuthorizationException: Concept "lilac/legal-termination" does not exist or user does not have access.

// 5

PNG image

if you fill up the text box on lang_detection and show preview, if itgoes off the page the padding or css for the slider button is touching bottom of browser

bottom padding issue

// 6

PNG image

if you apply a filter it will be like 10 of 88,393 but potentially just see the second part if no filters...

// 7

PNG image

seemed to occur when changing "sort by" rapidly... but could not reproduce

// 8

Screenshot 2023-08-25 at 5 09 04 PM

no palm env key on hf?

sammcgrail commented 1 year ago

message Internal Server Error url POST /api/v1/concepts/{namespace}/{concept_name}/model/{embedding_name}/score status 500 path { "namespace": "lilac", "concept_name": "source-code", "embedding_name": "gte-small" } body { "examples": [ { "text": "menu. {\"smallUrl\":\"https:\/\/www.wikihow.com\/images\/thumb\/a\/ad\/Enable-Automatic-Updates-Step-5Bullet3.jpg\/v4-460px-Enable-Automatic-Updates-Step-5Bullet3.jpg\",\"bigUrl\":\"\/images\/thumb\/a\/ad\/Enable-Automatic-Updates-Step-5Bullet3.jpg\/aid1480351-v4-728px-Enable-Automatic-Updates-Step-5Bullet3.jpg\",\"smallWidth\":460,\"smallHeight\":345,\"bigWidth\":\"728\",\"bigHeight\":\"546\",\"licensing\":\"<div" } ] } details Traceback (most recent call last): File "/home/user/.local/lib/python3.9/site-packages/lilac/embeddings/transformer_utils.py", line 18, in get_model import torch.backends.mps ModuleNotFoundError: No module named 'torch'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/user/.local/lib/python3.9/site-packages/lilac/router_utils.py", line 24, in custom_route_handler return await original_route_handler(request) File "/home/user/.local/lib/python3.9/site-packages/fastapi/routing.py", line 241, in app raw_response = await run_endpoint_function( File "/home/user/.local/lib/python3.9/site-packages/fastapi/routing.py", line 169, in run_endpoint_function return await run_in_threadpool(dependant.call, *values) File "/home/user/.local/lib/python3.9/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool return await anyio.to_thread.run_sync(func, args) File "/home/user/.local/lib/python3.9/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/user/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/home/user/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, *args) File "/home/user/.local/lib/python3.9/site-packages/lilac/router_concept.py", line 187, in score server_compute_concept(concept_scorer, cast(Iterable[RichData], File "/home/user/.local/lib/python3.9/site-packages/lilac/router_utils.py", line 50, in server_compute_concept return list(signal.compute(texts)) File "/home/user/.local/lib/python3.9/site-packages/lilac/batch_utils.py", line 73, in unflatten for original_input in original_inputs: File "/home/user/.local/lib/python3.9/site-packages/lilac/embeddings/embedding.py", line 37, in _embed_fn for item in items: File "/home/user/.local/lib/python3.9/site-packages/lilac/embeddings/gte.py", line 47, in compute batch_size, model = get_model(self._model_name, _OPTIMAL_BATCH_SIZES[self._model_name]) File "/home/user/.local/lib/python3.9/site-packages/lilac/embeddings/transformer_utils.py", line 21, in get_model raise ImportError('Could not import the "sentence_transformers" python package. ' ImportError: Could not import the "sentence_transformers" python package. Please install it with pip install sentence-transformers.

Screenshot 2023-08-30 at 2 02 51 PM
sammcgrail commented 1 year ago
Screenshot 2023-08-30 at 2 18 34 PM

https://lilacai-lilac.hf.space/signals#text_statistics

`ApiError message Internal Server Error url POST /api/v1/signals/compute status 500 path undefined body { "signal": { "signal_name": "text_statistics" }, "inputs": [ "" ] } details Traceback (most recent call last): File "/home/user/.local/lib/python3.9/site-packages/lilac/signals/text_statistics.py", line 45, in setup import spacy ModuleNotFoundError: No module named 'spacy'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/user/.local/lib/python3.9/site-packages/lilac/router_utils.py", line 24, in custom_route_handler return await original_route_handler(request) File "/home/user/.local/lib/python3.9/site-packages/fastapi/routing.py", line 241, in app raw_response = await run_endpoint_function( File "/home/user/.local/lib/python3.9/site-packages/fastapi/routing.py", line 169, in run_endpoint_function return await run_in_threadpool(dependant.call, *values) File "/home/user/.local/lib/python3.9/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool return await anyio.to_thread.run_sync(func, args) File "/home/user/.local/lib/python3.9/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/user/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "/home/user/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 807, in run result = context.run(func, *args) File "/home/user/.local/lib/python3.9/site-packages/lilac/router_signal.py", line 81, in compute signal.setup() File "/home/user/.local/lib/python3.9/site-packages/lilac/signals/text_statistics.py", line 49, in setup raise ImportError('Could not import the "spacy" python package. ' ImportError: Could not import the "spacy" python package. Please install it with pip install spacy.`

sammcgrail commented 1 year ago

image

https://lilacai-lilac.hf.space/datasets#lilac/databricks-dolly-15k-curated-en&query=%7B%22searches%22%3A%5B%7B%22path%22%3A%5B%22original-context%22%5D%2C%22type%22%3A%22concept%22%2C%22concept_namespace%22%3A%22lilac%22%2C%22concept_name%22%3A%22non-english%22%2C%22embedding%22%3A%22gte-small%22%7D%5D%7D

on this link, I MASHED the ascending vs descending sort button, and eventually got

message Internal Server Error url POST /api/v1/datasets/{namespace}/{dataset_name}/select_rows status 500 path { "namespace": "lilac", "dataset_name": "databricks-dolly-15k-curated-en" } body { "searches": [ { "path": [ "original-context" ], "type": "concept", "concept_namespace": "lilac", "concept_name": "non-english", "embedding": "gte-small" } ], "columns": [ "" ], "combine_columns": true, "sort_by": [ [ "original-context", "lilac/non-english/gte-small", "", "score" ] ], "sort_order": "ASC", "limit": 20, "offset": 0 } details Traceback (most recent call last): File "/home/user/.local/lib/python3.9/site-packages/lilac/router_utils.py", line 24, in custom_route_handler return await original_route_handler(request) File "/home/user/.local/lib/python3.9/site-packages/fastapi/routing.py", line 241, in app raw_response = await run_endpoint_function( File "/home/user/.local/lib/python3.9/site-packages/fastapi/routing.py", line 169, in run_endpoint_function return await run_in_threadpool(dependant.call, *values) File "/home/user/.local/lib/python3.9/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool return await anyio.to_thread.run_sync(func,args) File "/home/user/.local/lib/python3.9/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/home/user/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2106, in run_sync_in_worker_thread return await future File "/home/user/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 833, in run result = context.run(func, *args) File "/home/user/.local/lib/python3.9/site-packages/lilac/router_dataset.py", line 230, in select_rows res = dataset.select_rows( File "/home/user/.local/lib/python3.9/site-packages/lilac/data/dataset_duckdb.py", line 1048, in select_rows df[signal_column] = deep_unflatten(signal_out, input) File "/home/user/.local/lib/python3.9/site-packages/lilac/batch_utils.py", line 51, in deep_unflatten return cast(list, _deep_unflatten(iter(flat_input), original_input, is_primitive_predicate)) File "/home/user/.local/lib/python3.9/site-packages/lilac/batch_utils.py", line 44, in _deep_unflatten return [_deep_unflatten(flat_input, orig_elem, is_primitive_predicate) for orig_elem in values] File "/home/user/.local/lib/python3.9/site-packages/lilac/batch_utils.py", line 44, in <listcomp> return [_deep_unflatten(flat_input, orig_elem, is_primitive_predicate) for orig_elem in values] File "/home/user/.local/lib/python3.9/site-packages/lilac/batch_utils.py", line 37, in _deep_unflatten return next(flat_input) File "/home/user/.local/lib/python3.9/site-packages/lilac/data/dataset_utils.py", line 301, in sparse_to_dense_compute out = next(dense_output) File "/home/user/.local/lib/python3.9/site-packages/lilac/batch_utils.py", line 73, in unflatten for original_input in original_inputs: File "/home/user/.local/lib/python3.9/site-packages/lilac/embeddings/vector_store.py", line 138, in get spans = self._id_to_spans[path_key] KeyError: ('57d0dd9c82ff4710826a2a8338cbb1d1',)

sammcgrail commented 1 year ago

REPRODUCING PREVIOUS COMMENT ^^^^^

Reproduced here https://lilacai-lilac.hf.space/datasets#lilac/databricks-dolly-15k-curated-en&query=%7B%22searches%22%3A%5B%7B%22path%22%3A%5B%22original-context%22%5D%2C%22type%22%3A%22concept%22%2C%22concept_namespace%22%3A%22lilac%22%2C%22concept_name%22%3A%22non-english%22%2C%22embedding%22%3A%22gte-small%22%7D%5D%2C%22sort_by%22%3A%5B%5B%22original-context%22%2C%22lilac%2Fnon-english%2Fgte-small%22%2C%22*%22%2C%22score%22%5D%5D%2C%22sort_order%22%3A%22ASC%22%7D

ApiError message Internal Server Error url POST /api/v1/datasets/{namespace}/{dataset_name}/select_rows status 500 path { "namespace": "lilac", "dataset_name": "databricks-dolly-15k-curated-en" } body { "searches": [ { "path": [ "original-context" ], "type": "concept", "concept_namespace": "lilac", "concept_name": "non-english", "embedding": "gte-small" } ], "columns": [ "" ], "combine_columns": true, "sort_by": [ [ "original-context", "lilac/non-english/gte-small", "", "score" ] ], "sort_order": "ASC", "limit": 20, "offset": 0 } details Traceback (most recent call last): File "/home/user/.local/lib/python3.9/site-packages/lilac/router_utils.py", line 24, in custom_route_handler return await original_route_handler(request) File "/home/user/.local/lib/python3.9/site-packages/fastapi/routing.py", line 241, in app raw_response = await run_endpoint_function( File "/home/user/.local/lib/python3.9/site-packages/fastapi/routing.py", line 169, in run_endpoint_function return await run_in_threadpool(dependant.call, *values) File "/home/user/.local/lib/python3.9/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool return await anyio.to_thread.run_sync(func, args) File "/home/user/.local/lib/python3.9/site-packages/anyio/to_thread.py", line 33, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/home/user/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2106, in run_sync_in_worker_thread return await future File "/home/user/.local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 833, in run result = context.run(func, *args) File "/home/user/.local/lib/python3.9/site-packages/lilac/router_dataset.py", line 230, in select_rows res = dataset.select_rows( File "/home/user/.local/lib/python3.9/site-packages/lilac/data/dataset_duckdb.py", line 1048, in select_rows df[signal_column] = deep_unflatten(signal_out, input) File "/home/user/.local/lib/python3.9/site-packages/lilac/batch_utils.py", line 51, in deep_unflatten return cast(list, _deep_unflatten(iter(flat_input), original_input, is_primitive_predicate)) File "/home/user/.local/lib/python3.9/site-packages/lilac/batch_utils.py", line 44, in _deep_unflatten return [_deep_unflatten(flat_input, orig_elem, is_primitive_predicate) for orig_elem in values] File "/home/user/.local/lib/python3.9/site-packages/lilac/batch_utils.py", line 44, in return [_deep_unflatten(flat_input, orig_elem, is_primitive_predicate) for orig_elem in values] File "/home/user/.local/lib/python3.9/site-packages/lilac/batch_utils.py", line 37, in _deep_unflatten return next(flat_input) File "/home/user/.local/lib/python3.9/site-packages/lilac/data/dataset_utils.py", line 301, in sparse_to_dense_compute out = next(dense_output) File "/home/user/.local/lib/python3.9/site-packages/lilac/batch_utils.py", line 73, in unflatten for original_input in original_inputs: File "/home/user/.local/lib/python3.9/site-packages/lilac/embeddings/vector_store.py", line 138, in get spans = self._id_to_spans[path_key] KeyError: ('57d0dd9c82ff4710826a2a8338cbb1d1',)

Screenshot 2023-08-31 at 2 18 03 PM

I think I selected

Screenshot 2023-08-31 at 2 18 19 PM

This option from dropdown, then scrolled so more rows would be rendered, then mashed asc/desc sort button

brilee commented 9 months ago

I believe this was fixed in https://github.com/lilacai/lilac/pull/849