Open sourabhyadav opened 3 years ago
I don't have any reason to believe that this is a FiftyOne issue just yet, though I am still investigating.
Could you provide your PyTorch version? It may be that upgrading PyTorch will solve the issue, although it is only a suggestion.
Yeah, it seems there is a problem with model loading. FYI, it is on Windows there won't be Cuda support.
torch: 1.7.1
torchvision: 0.2.2.post3
Hmm, 0.2.2.post3
is a fairly old version of torchvision. For example, I'm running torchvision==0.8.2
with torch==1.7.1
. However, I downgraded to 0.2.2.post3
on macOS (also CPU only) and was able to run compute_uniqueness()
with no problem.
Looking at the error message, this seems to be a windows path problem. The stack trace is complaining that variable\name.weight
is expected but variable_name.weight
is found instead.
Are you able to try upgrading your torchvision version? Perhaps this is something that the torchvision team has fixed in later versions. I briefly checked online and didn't find anything about this...
I updated the torch and torch visoin to:
torch 1.7.1+cpu
torchaudio 0.7.2
torchvision 0.8.2+cpu
But the issue still remains the same. Did anyone has tried on Windows PC?
I updated the torch and torch visoin to:
torch 1.7.1+cpu torchaudio 0.7.2 torchvision 0.8.2+cpu
But the issue still remains the same. Did anyone has tried on Windows PC?
Hopefully today, I need to spin up a Windows machine. Apologies.
Any update on this issue?
Hi @sourabhyadav @ShaneGilroy
No update on getting the default model used by compute_uniqueness()
to work on Windows yet.
However, let me tell you a secret: you can compute your own embeddings and pass them to the method instead, which will work in any environment. For example, you can use any model that exposes embeddings from the FiftyOne Model Zoo.
Here's a handy command FiftyOne CLI command to see what models are available:
fiftyone zoo models list --tags embeddings
And here's an example workflow:
import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset("quickstart")
# Remove existing uniqueness field
dataset = dataset.exclude_fields("uniqueness").clone()
session = fo.launch_app(dataset)
# Compute your own embeddings
model = foz.load_zoo_model("mobilenet-v2-imagenet-torch")
embeddings = dataset.compute_embeddings(model)
# Index by uniqueness using pre-computed embeddings
fob.compute_uniqueness(dataset, embeddings=embeddings)
print(dataset)
# Show least unique images in the App
session.view = dataset.sort_by("uniqueness")
However, let me also instead recommend that you take a look at visual similarity rather than uniqueness, which many users find to be more useful in practice. Similar idea, but more flexible.
For example, continuing from above:
# Index by visual similarity
fob.compute_similarity(dataset, embeddings=embeddings, brain_key="img_sim")
Then you can use the App to sort by visual similarity to samples of interest, or you can follow this workflow to find near-duplicate images.
However, let me tell you a secret: you can compute your own embeddings and pass them to the method instead, which will work in any environment.
@brimoor Thanks for the hint. I am wondering: What is the difference between compute_uniqueness()
and compute_similarity()
then? Your hint seems to suggest that the methodological approach/main algorithms are the same for both methods, is that correct? And then both methods provide somewhat different updates to the dataset. Like compute_uniqueness()
adds a uniqueness score as field while compute_similarity()
provides some methods to be used on the results!?
The two methods both use deep embeddings, but they do slightly different things with them.
But the main point here is that the two methods also use different default models to generate embeddings. The recommendation was to try a different model like "mobilenet-v2-imagenet-torch"
from our Model Zoo if you are a Windows user trying to use compute_uniqueness()
, since that model will work while the default one does not currently seem to work.
Any update on this issue?
Hi @Sejal1506. We don't have an update on getting the default model used by compute_uniqueness()
to work on Windows yet. It seems to be some part of PyTorch that is rewriting the state dict. Note that fiftyone-brain
is no longer a frozen package, which means full stack traces are available. Sharing any findings could expedite a solution
Instructions
I tried to check for finding duplicate images on local dataset. However, I am facing the following issue.
System information
fiftyone --version
): defaultCommands to reproduce
Describe the problem
It looks like there is some problem with model loading.
What areas of FiftyOne does this bug affect?
App
: FiftyOne application issueCore
: Corefiftyone
Python library issueServer
: Fiftyone server issueWillingness to contribute
The FiftyOne Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the FiftyOne codebase?